Preparation of Improved Turkish DataSet for Sentiment Analysis in Social Media

Semiha Makinist,Betül Ay Karakuş,Galip Aydın,İbrahim Rıza Hallaç,A Atangana,H Bulut,T Mekkaoui,F Bin Muhammad Belgacem,H.M Baskonus,Z Hammouch

doi:10.1051/itmconf/20171301030

Semiha Makinist, Betül Ay Karakuş + Show 8 more

Open Access

https://doi.org/10.1051/itmconf/20171301030

Copy DOI

Journal: ITM Web of Conferences	Publication Date: Jan 1, 2017
Citations: 3	License type: cc-by

Affiliation: Software (Spain), Fırat University

Abstract

A public dataset, with a variety of properties suitable for sentiment analysis [1], event prediction, trend detection and other text mining applications, is needed in order to be able to successfully perform analysis studies. The vast majority of data on social media is text-based and it is not possible to directly apply machine learning processes into these raw data, since several different processes are required to prepare the data before the implementation of the algorithms. For example, different misspellings of same word enlarge the word vector space unnecessarily, thereby it leads to reduce the success of the algorithm and increase the computational power requirement. This paper presents an improved Turkish dataset with an effective spelling correction algorithm based on Hadoop [2]. The collected data is recorded on the Hadoop Distributed File System and the text based data is processed by MapReduce programming model. This method is suitable for the storage and processing of large sized text based social media data. In this study, movie reviews have been automatically recorded with Apache ManifoldCF (MCF) [3] and data clusters have been created. Various methods compared such as Levenshtein and Fuzzy String Matching have been proposed to create a public dataset from collected data. Experimental results show that the proposed algorithm, which can be used as an open source dataset in sentiment analysis studies, have been performed successfully to the detection and correction of spelling errors.

Highlights

Today, most of the social media sources we meet frequently encounter in a very wide range of fields are generate by video sharing (YouTube), photo sharing (Instagram), and location based applications (Foursquare), blogs, microblogs (Twitter), social networks (Facebook)
In this study we propose a method for creating a data set in Turkish to improve the performance of sentiment analysis studies which are based on textual data
Data warehouses and open-source libraries for many languages are included in the literature, a full Turkish data set and library are not open source

Summary

Introduction

Most of the social media sources we meet frequently encounter in a very wide range of fields are generate by video sharing (YouTube), photo sharing (Instagram), and location based applications (Foursquare), blogs, microblogs (Twitter), social networks (Facebook). A company may collect data from social media for analyzing the opinions of their customers on their products. One of the most important reasons for this is that Twitter users share short and many messages By benefitting from these social media posts one can come up with the distribution of positive and negative thoughts, the tendencies of the targeted customer groups, the reputation and influence status on social media for people or companies. These type of analyses are mainly carried out by applying machine learning techniques on large volumes of data. Different methods are used during the collection of data such as data can be collected by using the application programming interface

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Preparation of Improved Turkish DataSet for Sentiment Analysis in Social Media

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: ITM Web of Conferences

Lead the way for us

Similar Papers

Natural Language Processing (NLP) for Sentiment Analysis in Social Media
Thomas Joseph
International Journal of Computing and Engineering | VOL. 6
Thomas JosephThomas Joseph
31 Jul 2024
International Journal of Computing and Engineering | VOL. 6

Integrating color cues to improve multimodal sentiment analysis in social media
Jieyu An ... Wan Mohd Nazmee Wan Zainon
Engineering Applications of Artificial Intelligence | VOL. 126
Jieyu An, et. al.Jieyu An ... Wan Mohd Nazmee Wan Zainon
09 Aug 2023
Engineering Applications of Artificial Intelligence | VOL. 126

Annotated Corpus of Mesopotamian-Iraqi Dialect for Sentiment Analysis in Social Media
Al-Khafaji Ali J Askar ... Nilam Nur
International Journal of Advanced Computer Science and Applications | VOL. 12
Al-Khafaji Ali J Askar, et. al.Al-Khafaji Ali J Askar ... Nilam Nur
01 Jan 2020
International Journal of Advanced Computer Science and Applications | VOL. 12

A Comprehensive Evaluation and Comparative Analysis of Data Mining Techniques for Sentiment Analysis in Social Media
Sakhawia Kaleem Farogh
International Journal of Advanced Research in Science, Communication and Technology | VOL. -
Sakhawia Kaleem Farogh Sakhawia Kaleem Farogh
18 Jun 2023
International Journal of Advanced Research in Science, Communication and Technology | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Preparation of Improved Turkish DataSet for Sentiment Analysis in Social Media

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: ITM Web of Conferences