Abstract
Sentiment Analysis (SA) has gained its popularity over the years for the benefit it brings to the development of economy, sociology and politic. SA enables observation, experiment, and quantification of emotions of the public toward a particular issue. However, there is not much SA done with respect to the Malay Language, especially in the context of the Malay dialects used in social media. The research presented in this paper aims to perform SA on one of the derivatives of the Malay language, namely Sabah Language. The Sabah Language, unlike many other languages, does not have a fixed spelling and, when used in an unstructured form as in the case of social media, poses particular difficulties for SA. This paper takes a lexicon-based approach to SA of the Sabah Language as used on social media. For the investigation, the corpuses selected were Facebook posts and tweets written in the Sabah language, 443 posts and tweets in total. Each was manually annotated as positive, negative or neutral by three annotators. As Sabah Language is a derivative of Malay language, the words used in Sabah Language contains most of Malay words. That is why, in Sentiment-Lexicon (SL) construction process, opinion-bearing Malay SL is retrieved, modified and expanded to build Sabah SL. Three different methods of assigning scores to the words in SL (opinion-bearing words) were employed during SL construction: (i) Simple PSA, (ii) Simple PSA with Switch Negation (PSA-SN) and (iii) Strength-based PSA. In this paper, pre-processing phase that includes spellchecker and shortform corrector is also implemented to reduce distinct word to be analyzed for SA. In classification phase, two classification methods, simple and bias aware classifications, were used to classify the posts. Experiments are conducted to show the effect of SL modification and expansion, the effect of pre-processing as well as the effect of bias-aware classification to the SA performed. Results show the highest accuracy of 85.10% was achieved using bias-aware classification with the modified and expanded SL, scores are assigned using Simple PSA and the pre-processed text.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.