Abstract
Detecting emotions in tweets is a huge challenge due to its limited 140 characters and extensive use of twitter language with evolving terms and slangs. This paper uses various preprocessing techniques, forms a feature vector using lexicons and classifies tweets into Paul Ekman's basic emotions namely, happy, sad, anger, fear, disgust and surprise using machine learning. Preprocessing is done using the dictionaries available for emoticons, interjections and slangs and by handling punctuation marks and hashtags. The feature vector is created by combining words from the NRC Emotion lexicon, WordNet-Affect and online thesaurus. Feature vectors are assigned weight based on the presence of punctuations and negations in the feature and the tweets are classified using naive Bayes, SVM and random forests. The use of lexicon features and a novel weighting scheme has produced a considerable gain in terms of accuracy with random forest achieving maximum accuracy of almost 73%.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Data Analysis Techniques and Strategies
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.