Abstract

Short text is one of the predominant forms of communication with unique characteristics such as short length, high sparsity, and lack of shared context and word co-occurrence. These characteristics distinguish short text from general text and make short text classification a challenging task. Term weighting is an important pre-processing step for text classification in the vector space model. In this paper, we propose three modifications to existing state-of-the-art term weighting schemes: ifn-tp-icf, RFR and modOR and a new term weighting scheme: ifn-modRF. We compare the proposed schemes with ten existing unsupervised and supervised schemes using three datasets of informally written short text: a self-labelled dataset of real-world events from Twitter, a Yahoo! questions dataset and a dataset of product reviews. Based on the experimental results using three popular classifiers, we observe that the proposed scheme ifn-modRF achieves the best F1-scores on the Twitter dataset, while the proposed modification modOR is a consistent performer with the best scores in most of the experiments. The proposed modification ifn-tp-icf also outperform the original scheme in most experiments.

Highlights

  • With the advent of platforms such as social media, users have become publishers of a huge volume of short text containing opinions, discussions, queries and facts

  • EVALUATION ON TWITTER EVENTS1306 DATASET 1) SELECTING THE BEST Support Vector Machines (SVM) KERNEL We conducted the first experiment on Events1306 to find the best performing SVM kernel among the radial basis function, sigmoid and linear

  • Term weighting is an important step for text classification and various unsupervised and supervised term weighting schemes have been proposed by researchers

Read more

Summary

Introduction

With the advent of platforms such as social media, users have become publishers of a huge volume of short text containing opinions, discussions, queries and facts. This has made short text a predominant form of communication. Examples of short text are microblogs such as Twitter, online product reviews and status updates on social media. Twitter is one of the most popular microblogging platforms with millions of users publishing more than a hundred million messages (called tweets) every day [1]. Apart from personal tweets, users publish messages about real-world events happening around the world. An event is discussed on Twitter in near real-time as it happens. With an explosion in the volume, tweets have been used by researchers in domains such as earthquake prediction

Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.