Theobald, Martin and Schenkel, Ralf and Weikum, Gerhard (2005) An Efficient and Versatile Query Engine for TopX Search. In: 31st International Conference on Very Large Data Bases (VLDB 2005).
Full text not available from this repository.Abstract
This paper presents a novel engine, coined TopX, for efficient ranked retrieval of XML documents over semistructured but nonschematic data collections. The algorithm follows the paradigm of threshold algorithms for top-k query processing with a focus on inexpensive sequential accesses to index lists and only a few judiciously scheduled random accesses. The difficulties in applying the existing top-k algorithms to XML data lie in 1) the need to consider scores for XML elements while aggregating them at the document level, 2) the combination of vague content conditions with XML path conditions, 3) the need to relax query conditions if too few results satisfy all conditions, and 4) the selectivity estimation for both content and structure conditions and their impact on evaluation strategies. TopX addresses these issues by precomputing score and path information in an appropriately designed index structure, by largely avoiding or postponing the evaluation of expensive path conditions so as to preserve the sequential access pattern on index lists, and by selectively scheduling random accesses when they are cost-beneficial. In addition, TopX can compute approximate topk results using probabilistic score estimators, thus speeding up queries with a small and controllable loss in retrieval precision.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Subjects: | DBIS Research > Publications |
Divisions: | Faculty of Engineering, Electronics and Computer Science > Institute of Databases and Informations Systems > DBIS Research and Teaching > DBIS Research > Publications |
Depositing User: | Prof. Dr. Martin Theobald |
Date Deposited: | 09 Sep 2015 19:56 |
Last Modified: | 09 Sep 2015 19:56 |
URI: | http://dbis.eprints.uni-ulm.de/id/eprint/1275 |