Abstract

Microblogs play an important role for Online Reputation Management. Companies and organizations in general have an increasing interest in obtaining the last minute information about which are the emerging topics that concern their reputation. In this paper, we present a new technique to cluster a collection of tweets emitted within a short time span about a specific entity. Our approach relies on transfer learning by contextualizing a target collection of tweets with a large set of unlabeled tweets that help improving the clustering of the target collection. We include background tweets together with target tweets in a TwitterLDA process, and we set the total number of clusters. In practice, this means that the system can adapt to find the right number of clusters for the target data, overcoming one of the limitations of using LDA-based approaches (the need of establishing a priori the number of clusters). Our experiments using RepLab 2012 data show that using the background collection gives a 20% improvement over a direct application of TwitterLDA using only the target collection. Our data also confirms that the approach can effectively predict the right number of target clusters in a way that is robust with respect to the total number of clusters established a priori.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.