Exploiting Structure, Annotation, and Ontological Knowledge for Automatic Classification of XML Data

Theobald, Martin and Schenkel, Ralf and Weikum, Gerhard (2003) Exploiting Structure, Annotation, and Ontological Knowledge for Automatic Classification of XML Data. In: 6th International Workshop on the Web and Databases.

Full text not available from this repository.

Abstract

This paper investigates how to automatically classify non-schematic XML data into a user-defined topic directory. The main focus is on constructing appropriate feature spaces on which a classifier operates. In addition to the usual text-based term frequency vectors, we study XML twigs and tag paths as extended features that can be combined with text term occurrences in XML elements. Moreover, we show how to leverage ontological background information, more specifically, the WordNet thesaurus, for the construction of more expressive feature spaces. For efficiency our implementation computes features incrementally and caches ontology entries. Our experiments demonstrate the improved accuracy of automatic classification based on the enhanced feature spaces.

Item Type: Conference or Workshop Item (Paper)
Subjects: DBIS Research > Publications
Divisions: Faculty of Engineering, Electronics and Computer Science > Institute of Databases and Informations Systems > DBIS Research and Teaching > DBIS Research > Publications
Depositing User: Prof. Dr. Martin Theobald
Date Deposited: 09 Sep 2015 20:06
Last Modified: 09 Sep 2015 20:06
URI: http://dbis.eprints.uni-ulm.de/id/eprint/1284

Actions (login required)

View Item
View Item