Modality annotation for Portuguese: from manual annotation to automatic labeling

Amália Mendes,João Sequeira,Paulo Quaresma,Iris Hendrickx,Teresa Gonçalves,Luciana Ávila

doi:10.33011/lilt.v14i.1401

Abstract

We investigate modality in Portuguese and we combine a linguistic perspective with an application-oriented perspective on modality. We design an annotation scheme reflecting theoretical linguistic concepts and apply this schema to a small corpus sample to show how the scheme deals with real world language usage. We present two schemas for Portuguese, one for spoken Brazilian Portuguese and one for written European Portuguese. Furthermore, we use the annotated data not only to study the linguistic phenomena of modality, but also to train a practical text mining tool to detect modality in text automatically. The modality tagger uses a machine learning classifier trained on automatically extracted features from a syntactic parser. As we only have a small annotated sample available, the tagger was evaluated on 11 modal verbs that are frequent in our corpus and that denote more than one modal meaning. Finally, we discuss several valuable insights into the complexity of the semantic concept of modality that derive from the process of manual annotation of the corpus and from the analysis of the results of the automatic labeling: ambiguity and the semantic and syntactic properties typically associated to one modal meaning in context, and also the interaction of modality with negation and focus. The knowledge gained from the manual annotation task leads us to propose a new unified scheme for modality that applies to the two Portuguese varieties and covers both written and spoken data.

Highlights

There has been a growing interest in text mining applications that can automatically detect opinions, facts and sentiments in texts
We focus instead on modal verbs with multiple modal meanings that each occur at least 5 times in the small annotated corpus sample
The range of phenomena that are not considered as modality in European Portuguese (EP) and Brazilian Portuguese (BP) schemes are very similar, and so are the components of both modality schemes

Summary

Introduction

There has been a growing interest in text mining applications that can automatically detect opinions, facts and sentiments in texts. Modality, defined from a linguistic perspective as the speaker’s attitude towards the proposition in the text (Palmer, 1986) offers a theoretical framework to make more fine-grained distinctions between different attitudes. In this paper we aim to combine a linguistic perspective with a practical and application-oriented perspective on modality. On the one hand we design an annotation scheme reflecting theoretical linguistic concepts and apply this schema to corpus data to fit with real world language usage. On the other hand we use the annotated data to study the linguistic phenomena of modality, but to train a practical text mining tool to detect modality in text automatically

Objectives

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Linguistic Issues in Language Technology	Publication Date: Aug 1, 2016
Citations: 13	License type: cc-by

R Discovery Prime

R Discovery Prime

Modality annotation for Portuguese: from manual annotation to automatic labeling

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Linguistic Issues in Language Technology

Lead the way for us

Similar Papers

O-125 Application of artificial intelligence using big data to devise and train a machine learning model on over 63,000 human embryos to automate time-lapse embryo annotation
A Campbell ... A Khan
Human Reproduction | VOL. 37
A Campbell, et. al.A Campbell ... A Khan
29 Jun 2022
Human Reproduction | VOL. 37

Vader Lexicon and Support Vector Machine Algorithm to Detect Customer Sentiment Orientation
Vivine Nurcahyawati ... Zuriani Mustaffa
Journal of Information Systems Engineering and Business Intelligence | VOL. 9
Vivine Nurcahyawati, et. al.Vivine Nurcahyawati ... Zuriani Mustaffa
28 Apr 2023
Journal of Information Systems Engineering and Business Intelligence | VOL. 9

Promoting Reproducible Research for Characterizing Nonmedical Use of Medications Through Data Annotation: Description of a Twitter Corpus and Guidelines.
Karen O'Connor ... Graciela Gonzalez Hernandez
Journal of medical Internet research | VOL. 22
Karen O'Connor, et. al.Karen O'Connor ... Graciela Gonzalez Hernandez
26 Feb 2020
Journal of medical Internet research | VOL. 22

Towards a cyberbullying detection approach: fine-tuned contrastive self-supervised learning for data augmentation
Lulwah M Al-Harigy ... Zhiyuan Tan
International Journal of Data Science and Analytics | VOL. -
Lulwah M Al-Harigy, et. al.Lulwah M Al-Harigy ... Zhiyuan Tan
17 Jul 2024
International Journal of Data Science and Analytics | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Modality annotation for Portuguese: from manual annotation to automatic labeling

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Linguistic Issues in Language Technology