Abstract

Hashtags of microblogs can provide valuable information for many natural language processing tasks. How to recommend reliable hashtags automatically has attracted considerable attention. However, existing studies assumed that all the training corpus crawled from social networks are labelled correctly, while large sample statistics on real social media shows that there are 8.9% of microblogs with hashtags having wrong labels. The notable influence of noisy data to the classifier is ignored before. Meanwhile, recency also plays an important role in microblog hashtag, but the information is not used in the existing studies. Some temporal hashtags such as World Cup will ignite at a particular time, but at other times, the number of people talking about it will sharply decrease. To address the twofold shortcomings above, the authors propose an long short-term memory-based model, which uses temporal enhanced selective sentence-level attention to reduce the influence of wrong labelled microblogs to the classifier. Experimental results using a dataset of 1.7 million microblogs collected from SINA Weibo microblogs demonstrated that the proposed method could achieve significantly better performance than the state-of-the-art methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.