Abstract

The development of wireless communication technology and mobile devices has brought about the advent of an era of sharing text data that overflows on social media and the web. In particular, social media has become a major source of storing people’s sentiments in the form of opinions and views on specific issues in the form of unstructured information. Therefore, the importance of emotion analysis is increasing, especially with machine learning for both personal life and companies’ management environments. At this time, data reliability is an essential component for data classification. The accuracy of sentiment classification can be heavily determined according to the reliability of data, in which case noise data may also influence this classification. Although there is stopword that does not have meaning in such noise data, data that does not fit the purpose of analysis can also be referred to as noise data. This paper aims to provide an analysis of the impact of profanity data on deep learning-based sentiment classification. For this purpose, we used movie review data on the Web and simulated the changes in performance before and after the removal of the profanity data. The accuracy of the model trained with the data and the model trained with the data before removal were compared to determine whether the profanity is noise data that lowers the accuracy in sentiment analysis. The simulation results show that the accuracy dropped by about 2% when judging profanity as noise data in the sentiment classification for review data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call