Abstract

Sentiment analysis is a process of detecting and classifying sentiments into positive, negative or neutral. Most sentiment analysis research focus on English lexicon vocabularies. However, Malay is still under-resourced. Research of sentiment analysis in Malaysia social media is challenging due to mixed language usage of English and Malay. The objective of this study was to develop a cross-lingual sentiment analysis using lexicon based approach. Two lexicons of languages are combined in the system, then, the Twitter data were collected and the results were determined using graph. The results showed that the classifier was able to determine the sentiments. This study is significant for companies and governments to understand people’s opinion on social network especially in Malay speaking regions.

Highlights

  • The amiable contextual definition of Big Data is dataset characterized by the 3Vs; Variety, Velocity and Volume that require particular Analytical Methods and Technology to transform into Value [1]

  • The technic of taking the frequent pairs of nouns or noun phrases that appear at the beginning of a sentiment will be a “representative sample” of opinions, both the negative and positive points of view will be covered in noun phrases or frequent nouns while others are placed in the infrequent features

  • The results show that it has high recall which shows that the result can be trusted

Read more

Summary

INTRODUCTION

The amiable contextual definition of Big Data is dataset characterized by the 3Vs; Variety, Velocity and Volume that require particular Analytical Methods and Technology to transform into Value [1]. Three main types of data structures are: (1) Structured data which contain a defined data type, format, and structure (that is simple spreadsheets, traditional RDBMS, online analytical processing [OLAP] data cubes, CSV files, and even transaction data), and any data that reside in a fixed field within a record or file This includes data contained in relational databases and spreadsheets; (2) Semi-structured data is a part of structured data that does not adjust with the formal structure of data models associated with relational databases or other forms of data tables, but enforce hierarchies of fields and records in the data and contains markers or other tags to classify semantic elements. With 19.05 million social media users [6] in Malaysia, there is a need to provide source and tools available in the Malay language for sentiment analysis.

RESEARCH BACKGROUND
Machine Learning Approach
Lexicon-based Approach
Proposed Cross-language Lexicon-based Approach
Preprocessing
Lexicon
Sentiment Analysis
RESULT
EXAMPLE ANALYSIS
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.