TopX: Efficient and Versatile Top-k Query Processing for Semistructured Data

Theobald, Martin and Bast, Holger and Majumdar, Debapriyo and Schenkel, Ralf and Weikum, Gerhard (2008) TopX: Efficient and Versatile Top-k Query Processing for Semistructured Data. The VLDB Journal, 17 (2). pp. 81-115.

Full text not available from this repository.


Recent IR extensions to XML query languages such as Xpath 1.0 Full-Text or the NEXI query language of the INEX benchmark series reflect the emerging interest in IR-style ranked retrieval over semistructured data. TopX is a top-k retrieval engine for text and semistructured data. It terminates query execution as soon as it can safely determine the k top-ranked result elements according to a monotonic score aggregation function with respect to a multidimensional query. It efficiently supports vague search on both content- and structure-oriented query conditions for dy\-namic query relaxation with controllable influence on the result ranking. The main contributions of this paper unfold into four main points: 1) fully implemented models and algorithms for ranked XML retrieval with XPath Full-Text functionality, 2) efficient and effective top-k query processing for semistructured data, 3) support for integrating thesauri and ontologies with statistically quantified relationships among concepts, leveraged for word-sense disambiguation and łinebreak query expansion, and 4) a comprehensive description of the TopX system, with performance experiments on large-scale corpora like TREC Terabyte and INEX Wikipedia.

Item Type: Article
Subjects: DBIS Research > Publications
Divisions: Faculty of Engineering, Electronics and Computer Science > Institute of Databases and Informations Systems > DBIS Research and Teaching > DBIS Research > Publications
Depositing User: Prof. Dr. Martin Theobald
Date Deposited: 28 Aug 2015 19:49
Last Modified: 28 Aug 2015 19:49

Actions (login required)

View Item
View Item