Abstract
With rapid growth of social networking service on Internet, huge amount of information are continuously generated in real time. As a result, sentiment analysis of online reviews and messages has become a popular research issue [1]. In this paper a novel modified Chi Square-based feature clustering and weighting scheme is proposed for the sentiment analysis of twitter message. Along with the part of speech tagging, the discriminability and dependency of the words in the tagged training dataset are taken into account in the clustering and weighting process. The multinomial Naïve Bayes model is also employed to handle redundant features, and the influence of emotional words is raised for maximizing the accuracy. Computer simulation with Sentiment 140 workload shows that the proposed scheme significantly outperforms four existing representative sentiment analysis schemes in terms of the accuracy regardless of the size of training and test data.
Highlights
Massive volume of data are generated and shared through internet [2,3,4]
In this paper a novel feature weighting approach is proposed, which is inspired by the expectation that enhancing the strength of the words of strong discriminability may allow higher accuracy of sentiment analysis [22, 23]
Twitter sentiment analysis has become a promising technique for industry and academia
Summary
Massive volume of data are generated and shared through internet [2,3,4]. There exist various forms with the data originated from internet, and especially text is quite popular for expressing and sharing information between individual users. In this paper a novel feature weighting approach is proposed, which is inspired by the expectation that enhancing the strength of the words of strong discriminability may allow higher accuracy of sentiment analysis [22, 23]. A novel feature reduction method is proposed to reduce the dimensionality (size of features) [26], which omits irrelevant data in classifying the training dataset into a small number of features and achieves a reasonable computational complexity when weighting the words [27, 28]. A novel composite feature weighting technique is proposed, which considers the dependency derived using the modified Chi Square technique and discriminability of the clustered feature set.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.