Abstract

Depending on the scanning mode, existing short text stream clustering methods can be divided into the following two kinds of methods: one-pass-based and batch-based. The one-pass-based method handles each text only one time, but cannot deal with the sparseness problem very well. The batch-based method obtains better results by allowing multiple iterations of each batch, but the efficiency is relatively low. To overcome these problems, this paper presents Lifelong learning Augmented Short Text stream clustering method (LAST), which incorporates the episodic memory module and sparse experience replay module of lifelong learning into the clustering process. Specifically, LAST processes each text one time, but at a certain interval it randomly samples some previously seen texts of the episodic memory to update cluster features by performing sparse experience replay. Empirical studies on two public datasets demonstrate that the performance of the LAST-based method is on a par with the batch-based method, and runs close to the speed of the one-pass-based method.

Highlights

  • Short texts are prevalent on the Web, including on traditional websites, e.g., news titles and search snippets, and emerging social media, e.g., microblogs and tweets

  • The realtime is relatively poor because of multiple iterations of each batch. To overcome these inherent weaknesses and keep the advantages of both one-pass-based and batch-based methods, we propose a novel clustering method, namely Lifelong learning Augmented Short Text stream clustering method (LAST), which adds episodic memory module and sparse experience replay module of lifelong learning into existing clustering method

  • SHORT TEXT STREAM CLUSTERING WITH EPISODIC MEMORY Considering the inherent characteristics of short text steams, we introduce a novel short text clustering method, namely Lifelong learning Augmented Short Text Stream Clustering (LAST), to alleviate the sparseness problem of the short text streams

Read more

Summary

Introduction

Short texts are prevalent on the Web, including on traditional websites, e.g., news titles and search snippets, and emerging social media, e.g., microblogs and tweets. In recent years, these data have swept the world at an alarming rate, and have produced large quantities of data streams, called short text streams. The onepass-based method assumes that the streaming texts come one by one, we can process each text only one time. The batch-based method assumes that the streaming texts come in batch, we can process the texts in each batch multiple times.

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call