Abstract

With increasing globalization, communication among people of diverse cultural backgrounds is also taking place to a very large extent in the present era. Issues like language diversity in various parts of the world can lead to hindrance in communication. The usage of social media and user-generated material has grown at an exponential rate and existing supervised sentiment polarity classification techniques need labelling for the training dataset. In this study, two problems have been analyzed. First, sentiment analysis of the Twitter dataset and sense disambiguation of morphologically rich Hindi language. A rule-based fuzzy logics-based system for self-supervised sentiment classification was used to compute and analyze the self-supervised or completely unsupervised sentiment categorization of a social-media dataset using three types of lexicons.  The combination of fuzzy with three different types of lexicons gives sentiment analysis a new path. The unsupervised fuzzy rules integrate the fuzziness of both negative as well as positive scores, and fuzzy logic-based systems can cope with ambiguity and vagueness. The fuzzy-system uses an unsupervised/self-supervised fuzzy rule-based technique to identify text using natural language processing (NLP) and sense of word. We compared the results of fuzzy rule based self-supervised sentiment classification by using three types of lexicons on five different datasets, with unsupervised as well as supervised sentiment classification techniques. Second, using cross-lingual sense embedding rather than cross-lingual word embedding resolves the ambiguity issue. The word sense embeddings are produced for the source languages to learn multiple or various senses of the words. Different evaluation metrics depict an improved performance for English-Hindi language.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call