Abstract

Real-world applications especially in the fields of social media have produced massive short text streams. Unlike traditional normal texts, these data present the characteristics of short length, high-volume, high-velocity and variable data distribution etc, which lead to the issues of data sparsity and concept drift. It is hence very challenging for existing short text classification algorithms. Therefore, we propose a flexible Long Short-Term Memory (LSTM) ensemble network based short text stream classification approach, which is implemented in a distributed mode while maintaining the high-accuracy advantage of deep learning models. More specifically, external resource based short text embedding using a pretrained embedding model and CNN is first proposed for the solution to the data sparsity of short texts. Second, to adapt to the high-volume and high-velocity short text streams, a flexible LSTM network is developed and implemented in a distributed mode for classifying short text data streams. Third, a concept drift factor is introduced for adapting to the concept drifts caused by the changing of data distributions. Finally, experiments conducted on three real short text data sets demonstrate that as compared with several state-of-the-art short text (stream) classification approaches, the proposed approach can classify short text streams effectively and efficiently while adapting to concept drifts.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call