Entity-level stream classification: exploiting entity similarity to label the future observations referring to an entity

Unnikrishnan, Vishnu and Beyer, Christian and Matuszyk, Pawel and Niemann, Uli and Pryss, Rüdiger and Schlee, Winfried and Ntoutsi, Eirini and Spiliopoulou, Myra (2020) Entity-level stream classification: exploiting entity similarity to label the future observations referring to an entity. International Journal of Data Science and Analytics, 9 . pp. 1-15.

[img] PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader

Official URL: https://link.springer.com/article/10.1007/s41060-019-00177-1


Stream classification algorithms traditionally treat arriving instances as independent. However, in many applications, the arriving examples may depend on the “entity” that generated them, e.g. in product reviews or in the interactions of users with an application server. In this study, we investigate the potential of this dependency by partitioning the original stream of instances/“observations” into entity-centric substreams and by incorporating entity-specific information into the learning model. We propose a k-nearest-neighbour-inspired stream classification approach, in which the label of an arriving observation is predicted by exploiting knowledge on the observations belonging to this entity and to entities similar to it. For the computation of entity similarity, we consider knowledge about the observations and knowledge about the entity, potentially from a domain/feature space different from that in which predictions are made. To distinguish between cases where this knowledge transfer is beneficial for stream classification and cases where the knowledge on the entities does not contribute to classifying the observations, we also propose a heuristic approach based on random sampling of substreams using k Random Entities (kRE). Our learning scenario is not fully supervised: after acquiring labels for the initial m observations of each entity, we assume that no additional labels arrive and attempt to predict the labels of near-future and far-future observations from that initial seed. We report on our findings from three datasets.

Item Type:Article
Subjects:DBIS Research > Publications
ID Code:1917
Deposited By: Ruediger Pryss
BibTex Export:BibTeX
Deposited On:18 Jun 2020 12:55
Last Modified:18 Jun 2020 12:55

Repository Staff Only: item control page