Abstract

This work describes the discourse markers present in two corpora for European Portuguese, in different domains (university lectures and map-task dialogues). In this study, we also perform a multiclass automatic classification task based on prosodic features to verify in both corpora which words are discourse markers, which are disfluencies, and which are sentence like-units (SUs). Results show that the selection of discourse markers varies across domain and between speakers. As for the classification task, results show that the discourse markers are better classified in the lectures corpus (87%) than in the dialogue corpus (84%). However, cross‑domain experiments evidenced that data trained with the dialogue corpus predicts better the events in the lecture corpus, since this domain displays more speakers and therefore complex patterns. In both corpora, markers are more easily classified as SUs than as disfluencies.

Highlights

  • This work describes the discourse markers present in two corpora for European Portuguese, in different domains

  • No domínio do processamento automático de fala, as marcas de pontuação, que delimitam sentence like-units (SUs), as disfluências e os marcadores discursivos fazem parte de um conjunto de eventos designados no inglês structural metadata events

  • Pretende-se recuperar automaticamente a pontuação e as maiúsculas em fronteiras de frase, bem como a anotação e filtragem de disfluências e de marcadores

Read more

Summary

Classificação prosódica de marcadores discursivos

Vera Cabarrão[1, 2], Helena Moniz[1, 2], Jaime Ferreira[1], Fernando Batista[1, 3], Isabel Trancoso[1, 4], Ana Isabel Mata[2], Sérgio Curto[1]

Ou seja
Classificada como SUs Disf MarcDisc
Findings
Diálogos para Aulas

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.