Abstract

The text mining literature shows a growing body of work concerned with the automatic identification of sentiment in text. Sentiment polarity classification is one of the most important text mining tasks. The typical approach to polarity classification uses lexicons to count word usage from linguistic or emotional aspects. One of the most widely used lexicons is the Linguistic Inquiry and Word Count (LIWC). LIWC assigns words to categories (e.g., positive emotion) based on a lexicon of words associated with psycholinguist categories. It has been widely used in polarity classification task with good results. However, it only accounts for word count, discarding the text structure and ignoring important semantic relationships between words. In this work, we present LIWBC, an algorithm to count bigrams using the lexicon provided by LIWC. The goal is to incorporate text structure information to improve the polarity classification task with LIWC lexicon. We conducted experiments to evaluate LIWBC with two real datasets: the first one consists of blogger posts; the second one is the movie reviews dataset, which contains full-text movie reviews from IMDB. Both datasets were processed with LIWC and LIWBC. After that, we ran four classification algorithms in the data processed by LIWC and LIWBC. The SVM algorithm executed with LIWBC data yielded the best result in both datasets. The F1 score of SVM in blogger posts and movie reviews dataset had an improvement of 2.2% and 2.5%, respectively.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.