Abstract

This paper describes the CASOAR corpus, the first manually annotated corpus that explores the impact of discourse structure on sentiment analysis with a study of movie reviews in French and in English as well as letters to the editor in French. While annotating opinions at the expression, the sentence or the document level is a well-established task and relatively straightforward, discourse annotation remains difficult, especially for non-experts. Therefore, combining both annotations poses several methodological problems that we address here. We propose a multi-layered annotation scheme that includes: the complete discourse structure according to the Segmented Discourse Representation Theory, the opinion orientation of elementary discourse units and opinion expressions, and their associated features. We detail each layer, explore the interactions between them and discuss our results. In particular, we examine the correlation between discourse and semantic category of opinion expressions, the impact of discourse relations on both subjectivity and polarity analysis and the impact of discourse on the determination of the overall opinion of a document. Our results demonstrate that discourse is an important cue for sentiment analysis, at least for the corpus genres we have studied.

Highlights

  • Sentiment analysis has been one of the most popular applications of natural language processing for over a decade both in academic research institutions and in industry

  • Flashback was highly infrequent in all the corpora (0.12%, 0.06% and 0% for respectively French movie/product reviews (F MR), French news reactions (FNR), and English data are movie reviews (EMR)) and second, the relation Unknown was not used in EMR since the discourse annotation in this corpus has been performed by consensus

  • More annotations are needed to validate this assertion. 5.3 Impact of discourse on sentiment analysis we attempt to answer the challenges mentioned in the introduction of this paper: What is the role of discourse relations in subjectivity analysis? What is the impact of the discourse structure in determining the overall opinion conveyed by a document? Does a discourse based approach really bring additional value compared to a classical bag of words approach? Does this additional value depend on corpus genre? To this end, we explored the interactions between the discourse, the segment, and the opinion expression annotation layer

Read more

Summary

Introduction

Sentiment analysis has been one of the most popular applications of natural language processing for over a decade both in academic research institutions and in industry In this domain, researchers analyze how people express their sentiments, opinions and points of view from natural language data such as customer reviews, blogs, fora and newspapers. Correction concerns CDUs in most of 55% of cases This relation links segments sharing a common topic and such that the second argument corrects the information given in the first argument (which is often at a long distance attachment) (see the Correction in Example (16)). Another interesting behavior comes from the Contrast relation. Example (17) illustrates a Contrast with scope over two CDUs

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call