Word Sense Disambiguation for Exploiting Hierarchical Thesauri in Text Classification

Mavroeidis, Dimitrios and Tsatsaronis, George and Vazirgiannis, Michalis and Theobald, Martin and Weikum, Gerhard (2005) Word Sense Disambiguation for Exploiting Hierarchical Thesauri in Text Classification. In: 9th European Conference on Principles and Practice of Knowledge Discovery in Databases Knowledge discovery in databases (PKDD 2005) .

Full text not available from this repository.

Abstract

The introduction of hierarchical thesauri (HT) that contain significant semantic information, has led researchers to investigate their potential for improving performance of the text classification task, extending the traditional ''bag of words'' representation, incorporating syntactic and semantic relationships among words. In this paper we address this problem by proposing a Word Sense Disambiguation (WSD) approach based on the intuition that word proximity in the document implies proximity also in the HT graph. We argue that the high precision exhibited by our WSD algorithm in various humanly-disambiguated benchmark datasets, is appropriate for the classification task. Moreover, we define a semantic kernel, based on the general concept of GVSM kernels, that captures the semantic relations contained in the hierarchical thesaurus. Finally, we conduct experiments using various corpora achieving a systematic improvement in classification accuracy using the SVM algorithm, especially when the training set is small.

Item Type: Conference or Workshop Item (Paper)
Subjects: DBIS Research > Publications
Divisions: Faculty of Engineering, Electronics and Computer Science > Institute of Databases and Informations Systems > DBIS Research and Teaching > DBIS Research > Publications
Depositing User: Prof. Dr. Martin Theobald
Date Deposited: 09 Sep 2015 20:04
Last Modified: 09 Sep 2015 20:04
URI: http://dbis.eprints.uni-ulm.de/id/eprint/1278

Actions (login required)

View Item
View Item