TopX: Efficient and Versatile Top-k Query Processing for Semistructured Data

Theobald, Martin and Bast, Holger and Majumdar, Debapriyo and Schenkel, Ralf and Weikum, Gerhard (2008) TopX: Efficient and Versatile Top-k Query Processing for Semistructured Data. The VLDB Journal, 17 (2). pp. 81-115.

Full text not available from this repository.


Recent IR extensions to XML query languages such as Xpath 1.0 Full-Text or the NEXI query language of the INEX benchmark series reflect the emerging interest in IR-style ranked retrieval over semistructured data. TopX is a top-k retrieval engine for text and semistructured data. It terminates query execution as soon as it can safely determine the k top-ranked result elements according to a monotonic score aggregation function with respect to a multidimensional query. It efficiently supports vague search on both content- and structure-oriented query conditions for dy\-namic query relaxation with controllable influence on the result ranking. The main contributions of this paper unfold into four main points: 1) fully implemented models and algorithms for ranked XML retrieval with XPath Full-Text functionality, 2) efficient and effective top-k query processing for semistructured data, 3) support for integrating thesauri and ontologies with statistically quantified relationships among concepts, leveraged for word-sense disambiguation and łinebreak query expansion, and 4) a comprehensive description of the TopX system, with performance experiments on large-scale corpora like TREC Terabyte and INEX Wikipedia.

Item Type:Article
Subjects:DBIS Research > Publications
ID Code:1256
Deposited By: Prof. Dr. Martin Theobald
BibTex Export:BibTeX
Deposited On:28 Aug 2015 19:49
Last Modified:28 Aug 2015 19:49

Repository Staff Only: item control page