Abstract

Beside the advantages of the typical sentiment analysis, which focuses on predicting the positive or negative polarity of the given sentence(s), there are two main drawbacks of performing sentiment analysis on higher level, namely on sentence and document level. Firstly, gaining the overall sentiment of a sentence or a paragraph may not lead to accurate and precise information. The polarity will be valid for a broader context and not for particular targets. Secondly, many sentences or paragraphs may have opposing polarities towards different targets. This makes it difficult or impossible to give an accurate overall polarity. The necessity for detecting aspect terms and their corresponding polarity gave rise to aspect-based sentiment analysis (ABSA). To meet the objectives of aspect-based sentiment analysis systems, the process can be summarized in three main tasks: Aspect Term Extraction, Aspect-term and Opinion-word Separation and Sentiment Polarity Classification. Most commonly, supervised learning approaches are used for ABSA. However, having to build the tagged training and testing corpora for each language and each domain is highly time consuming and can often be achieved only manually. This is why we have used a semi-supervised model for designing a language- and domain-independent system that is based on novel machine learning approaches through which we are focused on analyzing Albanian texts and make use of Albanian data in the digital world. In this approach where we try to extract the aspects and the polarity of their corresponding opinions through almost unsupervised learning, the biggest challenge is to reach high accuracy in natural language processing. In order to achieve this, in language-independent systems there must be taken into consideration all the differences and similarities of the languages. In this paper our aim is to define the biggest challenges that appear in Albanian language in comparison with English; and after analyzing certain amount of data, we have identified the following issues: inflections, negation, homonyms, dialects, irony, sarcasm and stop-words’ presence in aspect terms. This is not an exhaustive list of the language issues, since we have selected and discussed only the ones that have greater impact in the process of extracting the aspect-terms and opinions, and can highly affect the accuracy of the final polarity classification of the texts.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call