Abstract

In recent years social media became an important part of everyday life for many people. A big challenge of social media is, to find posts, which are interesting for the user. Many social networks like Twitter handle this problem with so called hashtags. A user can label his own Tweet (post) with a hashtag, while other users can search for posts containing a specified hashtag. But what about finding posts which are not labeled by the creator? We provide a way of completing hashtags for unlabeled posts using classification on a novel real world Twitter data stream. New posts will be created every second, thus this context fits perfectly for non-stationary data analysis. Our goal is to show, how labels (hashtags) of social media posts can be predicted by streaming classifiers. In particular we employ Random Projection (RP) as a preprocessing step in calculating streaming models. Also we provide a novel real world data set for streaming analysis called NSDQ with a comprehensive data description. We show that this dataset is a real challenge for stateof-the-art stream classifiers. While RP has been widely used and evaluated in stationary data analysis scenarios, non-stationary environments are not well analyzed. In this paper we provide a use case of RP on real world streaming data, especially on NSDQ dataset. We discuss why RP can be used in this scenario and how it can handle stream specific situations like concept drift. We also provide experiments with RP on streaming data, using state-of-the-art streaming classifiers like Adaptive Random Forest and concept drift detectors.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call