TopX 2.0 at the INEX 2008 Efficiency Track

Theobald, Martin and AbuJarour, Mohammed and Schenkel, Ralf (2008) TopX 2.0 at the INEX 2008 Efficiency Track. In: 7th International Workshop of the Initiative for the Evaluation of XML Retrieval.

Full text not available from this repository.

Abstract

For the INEX Efficiency Track 2008, we were just on time to finish and (for the first time) evaluate our brand-new TopX 2.0 prototype. Complementing our long-running effort on efficient top-k query processing on top of a relational back-end, we now switched to a compressed object-oriented storage for text-centric XML data with direct access to customized inverted files, along with a complete reimplementation of the engine in C++. Core of the new engine is a multiple-nested block-index structure that seamlessly integrates top-kstyle sorted access to large blocks stored as inverted files on disk with in-memory merge-joins for efficient score aggregations. The main challenge in designing this new index structure was to marry no less than three different paradigms in search engine design: 1) sorting blocks in descending order of the maximum element score they contain for threshold-based candidate pruning and top-k-style early termination; 2) sorting elements within each block by their id to support efficient in-memory merge-joins; and 3) encoding both structural and contentrelated information into a single, unified index structure. Our INEX 2008 experiments demonstrate efficiency gains of up to a factor of 30 compared to the previous Java/JDBC-based TopX 1.0 implementation over a relational back-end. TopX 2.0 achieves overall runtimes of less than 51 seconds for the entire batch of 568 Efficiency Track topics in their content-and-structure (CAS) version and less than 29 seconds for the content-only (CO) version, respectively, using a top-15, focused (i.e., non-overlapping) retrieval mode—an average of merely 89 ms per CAS query and 49 ms per CO query.

Item Type: Conference or Workshop Item (Paper)
Subjects: DBIS Research > Publications
Divisions: Faculty of Engineering, Electronics and Computer Science > Institute of Databases and Informations Systems > DBIS Research and Teaching > DBIS Research > Publications
Depositing User: Prof. Dr. Martin Theobald
Date Deposited: 09 Sep 2015 19:49
Last Modified: 09 Sep 2015 19:49
URI: http://dbis.eprints.uni-ulm.de/id/eprint/1255

Actions (login required)

View Item
View Item