Abstract

National Happiness Index (NHI) is a national indicator of development that estimates the economic and social well-being of the nation's individuals. With the proliferation of the internet, people share a significant amount of data on social media websites. We can process the data with different sentiment analysis techniques to calculate the NHI. In the literature, different approaches have been used to calculate NHI, which include the lexicon-based approach and machine learning approach. All of these existing approaches are proposed to calculate NHI for the sentiments written in the English language. However, these methods fail for complex Roman Urdu tweets that contain more than two sub-opinions. There are three primary objectives of the research: (1) to investigate current sentiment analysis techniques are sufficient for the classification of complex Roman Urdu sentiments; (2) to propose rule-based classifier for the classification of Roman Urdu sentiments comprising multiple sub-opinions; (3) to calculate NHI using Roman Urdu sentiments. For this purpose, we proposed the discourse information extractor, the rule-based method (3-RBC), and the machine learning classifier. The experimental results show that 3-RBC is efficient for feature identification, and it is more statistically significant than the baseline classifiers. The 3-RBC has successfully increased the accuracy by 7% and precision by 8%, which provides evidence that the proposed technique significantly increased the calculation of NHI.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call