Abstract
In this paper, we proposed a lexicon for emotion analysis in Turkish for six emotional categories happiness, fear, anger, sadness, disgust, and surprise. Besides, we also investigated the effects of a lemmatizer and a stemmer, two term-weighting schemes, four lexicon enrichment methods, and a term selection approach for lexicon construction. To do this, we generated Turkish emotion lexicon based on a dataset, TREMO, containing 25,989 documents. We then preprocessed the documents to obtain dictionary and stem forms of each term using a lemmatizer and a stemmer. Afterwards, we proposed two different weighting schemes where term frequency, term-class frequency and mutual information (MI) values for six emotion categories are taken into consideration. We then enriched the lexicon by using bigram and concept hierarchy methods, and performed term selection for efficiency issues. Then, we compared the performance of lexicon-based approach with machine learning based approach by using our proposed lexicon. The experiments showed that the use of the proposed lexicon efficiently produces comparable results in emotion analysis in Turkish text.
Highlights
In today’s world, a huge amount of raw text data has been generated with the increasing use of social media applications and communication equipment
We proposed a Turkish emotion lexicon (TEL)1 that can be used in emotion analysis
We examined the effects of a lemmatizer and a stemmer, two term-weighting schemes, four different lexicon enrichment methods, and a term selection process for lexicon-based emotion analysis, respectively
Summary
In today’s world, a huge amount of raw text data has been generated with the increasing use of social media applications and communication equipment. Besides developing lexicons for sentiment analysis, there are available lexicons for emotion analysis in English as follows: Mohammad and Turney [18] created a new lexicon named EmoLex which contains 14,182 words in total. In another study, Mohammed [23] generated a large dataset called Twitter emotion corpus from Twitter by collecting tweets with hashtags corresponding to Ekman’s six emotion categories He created a new emotion-based lexicon called TEC, containing 11,418 word types, by using the Twitter emotion corpus dataset. The second dataset was generated by Demirci who focused on extracting emotion from Turkish microblog entries [33] She collected tweets for the six emotions, anger, disgust, fear, joy, sadness, and surprise, using the Twitter search mechanism for hashtags. The last section concludes the paper providing a summary based on the experimentation we performed and gives a look at furthers studies on this topic
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.