Abstract
Twitter has become instrumental as a means of spreading information, opinions or awareness about real-world events. The classification of event-related tweets is a challenging problem since tweets are noisy and sparse pieces of text that lack contextual information. Related work proposes contextual enrichment techniques using external sources (e.g. semantic web, external documents), often considering underlying assumptions about the target events. However, they lack guidelines for determining the textual features to enrich, the external sources to use, the properties to explore, and how to prevent the inclusion of unrelated information. In this paper, we propose a hybrid semantic enrichment framework for the classification of event-related tweets. We contribute to this field by leveraging different contextual enrichment strategies into a unifying framework targeted at a broad range of event types, where each enrichment technique has a role in the improvement of event classification. The framework also encompasses a solution to deal with the huge number of features that result from semantic enrichment, which combines a pruning method to select domain relevant semantic features and general-purpose feature selection techniques. We assessed the contribution of each framework component to event classification improvement using a broad experimental setting. Using seven events of distinct natures, we outperformed a word embeddings baseline in 93.6% of cases, and a textual baseline in 60.3% of cases. In most cases, we improved the recall, with no significant impact on the precision.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.