Entity-level stream classification: exploiting entity similarity to label the future observations referring to an entity

Unnikrishnan, Vishnu and Beyer, Christian and Matuszyk, Pawel and Niemann, Uli and Pryss, Rüdiger and Schlee, Winfried and Ntoutsi, Eirini and Spiliopoulou, Myra (2020) Entity-level stream classification: exploiting entity similarity to label the future observations referring to an entity. International Journal of Data Science and Analytics, 9 . pp. 1-15.

[thumbnail of 18.DSAA.pdf] PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (1MB)

Abstract

Stream classification algorithms traditionally treat arriving instances as independent. However, in many applications, the arriving examples may depend on the “entity” that generated them, e.g. in product reviews or in the interactions of users with an application server. In this study, we investigate the potential of this dependency by partitioning the original stream of instances/“observations” into entity-centric substreams and by incorporating entity-specific information into the learning model. We propose a k-nearest-neighbour-inspired stream classification approach, in which the label of an arriving observation is predicted by exploiting knowledge on the observations belonging to this entity and to entities similar to it. For the computation of entity similarity, we consider knowledge about the observations and knowledge about the entity, potentially from a domain/feature space different from that in which predictions are made. To distinguish between cases where this knowledge transfer is beneficial for stream classification and cases where the knowledge on the entities does not contribute to classifying the observations, we also propose a heuristic approach based on random sampling of substreams using k Random Entities (kRE). Our learning scenario is not fully supervised: after acquiring labels for the initial m observations of each entity, we assume that no additional labels arrive and attempt to predict the labels of near-future and far-future observations from that initial seed. We report on our findings from three datasets.

Item Type: Article
Subjects: DBIS Research > Publications
Divisions: Faculty of Engineering, Electronics and Computer Science > Institute of Databases and Informations Systems > DBIS Research and Teaching > DBIS Research > Publications
Depositing User: Ruediger Pryss
Date Deposited: 18 Jun 2020 12:55
Last Modified: 18 Jun 2020 12:55
URI: http://dbis.eprints.uni-ulm.de/id/eprint/1917

Actions (login required)

View Item
View Item