Abstract
Depending on the scanning mode, existing short text stream clustering methods can be divided into the following two kinds of methods: one-pass-based and batch-based. The one-pass-based method handles each text only one time, but cannot deal with the sparseness problem very well. The batch-based method obtains better results by allowing multiple iterations of each batch, but the efficiency is relatively low. To overcome these problems, this paper presents Lifelong learning Augmented Short Text stream clustering method (LAST), which incorporates the episodic memory module and sparse experience replay module of lifelong learning into the clustering process. Specifically, LAST processes each text one time, but at a certain interval it randomly samples some previously seen texts of the episodic memory to update cluster features by performing sparse experience replay. Empirical studies on two public datasets demonstrate that the performance of the LAST-based method is on a par with the batch-based method, and runs close to the speed of the one-pass-based method.
Highlights
Short texts are prevalent on the Web, including on traditional websites, e.g., news titles and search snippets, and emerging social media, e.g., microblogs and tweets
The realtime is relatively poor because of multiple iterations of each batch. To overcome these inherent weaknesses and keep the advantages of both one-pass-based and batch-based methods, we propose a novel clustering method, namely Lifelong learning Augmented Short Text stream clustering method (LAST), which adds episodic memory module and sparse experience replay module of lifelong learning into existing clustering method
SHORT TEXT STREAM CLUSTERING WITH EPISODIC MEMORY Considering the inherent characteristics of short text steams, we introduce a novel short text clustering method, namely Lifelong learning Augmented Short Text Stream Clustering (LAST), to alleviate the sparseness problem of the short text streams
Summary
Short texts are prevalent on the Web, including on traditional websites, e.g., news titles and search snippets, and emerging social media, e.g., microblogs and tweets. In recent years, these data have swept the world at an alarming rate, and have produced large quantities of data streams, called short text streams. The onepass-based method assumes that the streaming texts come one by one, we can process each text only one time. The batch-based method assumes that the streaming texts come in batch, we can process the texts in each batch multiple times.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.