Abstract

This paper evaluates the citation sentences’ annotation complexity of both scientific as well as non-scientific text related articles to find out major complexity reasons by performing sentiment analysis of scientific and non-scientific domain articles using our own developed corpora of these domains separately. For this research, we selected different data sources to prepare our corpora in order to perform sentimental analysis. After that, we have performed a manual annotation procedure to assign polarities using our defined annotation guidelines. We developed a classification system to check the quality of annotation work for both domains. From results, we have found that the scientific domain gave us more accurate results than the non-scientific domain. We have also explored the reasons for less accurate results and concluded that non-scientific text especially linguistics is of complex nature that leads to poor understanding and incorrect annotation.

Highlights

  • The popular research area in this era is sentiment analysis [14]

  • We have explored the reasons for less accurate results for non-scientific data classification and concluded that nonscientific text especially linguistics is of complex nature that leads to poor understanding and incorrect annotation process

  • As human annotators faced much difficulty and complexity while annotating the non-scientific citation sentences due to its complex nature that leads to poor understanding and incorrect annotation

Read more

Summary

Introduction

The popular research area in this era is sentiment analysis [14]. Researchers widely used different types of textual data to perform sentiment analysis. To perform this work we are needed to prepare experimental data sets of both domains. To prepare scientific corpus we selected Elsevier Computer & Operations Research Journal and prepared a corpus consisted of 5161 citation sentences extracted from 262 research papers published in 2015-2019. We selected SJR Applied Linguistics Journal to prepare a nonscientific corpus consisted of 4989 citation sentences extracted from 250 research papers in 2015-2019. Using evaluation metrics e.g. f-score, and accuracy score, the system‟ accuracy is evaluated and improved using different data processing features selection techniques e.g. Lemmatization, NGrams, Tokenization, Case Normalization, and Stop Words Removal

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call