Abstract

Today Twitter has become a popular online medium for posting and sharing news and events. Generally, many Twitter posts or “tweets” refer to the same topics or events. Searching on Twitter could return a long list of search results. To solve the problem, we propose an approach for clustering the Twitter search results based on the Suffix Tree Clustering (STC) algorithm. However, two main drawbacks of original STC are some of the returned cluster labels are unmeaningful and it is unable to create hierarchical structure. In this paper, we present a new approach called Suffix Tree Clustering with Label Merging (STC-LM). The key idea of the STC-LM is to merge partially overlapped cluster labels and then create two-level label structure. We performed experiments by using Thai Twitter posts from 12 topics such as flooding, traffic and entertainment. The performance based on the F1 measure is equal to 70%, an improvement of 9% from the baseline method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.