Abstract

In this paper, we proposed a lexicon for emotion analysis in Turkish for six emotional categories happiness, fear, anger, sadness, disgust, and surprise. Besides, we also investigated the effects of a lemmatizer and a stemmer, two term-weighting schemes, four lexicon enrichment methods, and a term selection approach for lexicon construction. To do this, we generated Turkish emotion lexicon based on a dataset, TREMO, containing 25,989 documents. We then preprocessed the documents to obtain dictionary and stem forms of each term using a lemmatizer and a stemmer. Afterwards, we proposed two different weighting schemes where term frequency, term-class frequency and mutual information (MI) values for six emotion categories are taken into consideration. We then enriched the lexicon by using bigram and concept hierarchy methods, and performed term selection for efficiency issues. Then, we compared the performance of lexicon-based approach with machine learning based approach by using our proposed lexicon. The experiments showed that the use of the proposed lexicon efficiently produces comparable results in emotion analysis in Turkish text.

Highlights

  • In today’s world, a huge amount of raw text data has been generated with the increasing use of social media applications and communication equipment

  • We proposed a Turkish emotion lexicon (TEL)1 that can be used in emotion analysis

  • We examined the effects of a lemmatizer and a stemmer, two term-weighting schemes, four different lexicon enrichment methods, and a term selection process for lexicon-based emotion analysis, respectively

Read more

Summary

Introduction

In today’s world, a huge amount of raw text data has been generated with the increasing use of social media applications and communication equipment. Besides developing lexicons for sentiment analysis, there are available lexicons for emotion analysis in English as follows: Mohammad and Turney [18] created a new lexicon named EmoLex which contains 14,182 words in total. In another study, Mohammed [23] generated a large dataset called Twitter emotion corpus from Twitter by collecting tweets with hashtags corresponding to Ekman’s six emotion categories He created a new emotion-based lexicon called TEC, containing 11,418 word types, by using the Twitter emotion corpus dataset. The second dataset was generated by Demirci who focused on extracting emotion from Turkish microblog entries [33] She collected tweets for the six emotions, anger, disgust, fear, joy, sadness, and surprise, using the Twitter search mechanism for hashtags. The last section concludes the paper providing a summary based on the experimentation we performed and gives a look at furthers studies on this topic

Materials and methods
Term weighting
Experiments and discussions
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.