Abstract

In this paper we present different resources for the study of spoken Brazilian Portuguese, developed within the C-ORAL-BRASIL project. The C-ORAL-BRASIL stemmed from the European C-ORAL-ROM project (Cresti & Moneglia, 2005), which has compiled spoken corpora of Italian, French, Spanish, and European Portuguese. The corpora of the C-ORAL family represent adequate tools for the analysis of spoken language, for they are provided not only with the transcripts of the recorded sessions (with prosodic breaks’ annotation), but also with their audio files and the text-to-speech alignment. So far, the C-ORAL-BRASIL project has published the C-ORAL-BRASIL I (Informal corpus: Raso & Mello, 2012), while the C-ORAL-BRASIL II (to be published by 2019) comprises a Formal corpus (Natural context), a Media corpus, and a Telephonic corpus. Besides these resources, a set of informationally tagged comparable minicorpora (representative samples of the aforementioned corpora) are already available or in preparation, enabling (cross-linguistic) studies focussed on information structure.

Highlights

  • The C-ORAL family of spoken corpora: a brief reviewThe C-ORAL-ROM project (Cresti and Moneglia, 2005) aimed at compiling comparable resources for the study of spontaneous speech of some European Romance languages, resulting in the publication of four spoken corpora: the Italian corpus by the LABLITA lab at Florence

  • In this paper, we present different resources for the study of spoken Brazilian Portuguese, developed within the C-ORAL-BRASIL project

  • All corpora were planned as effective tools for the study of spoken language: besides the transcripts of the recording sessions, they provide the audio files and the text-to-speech alignment through the WinPitch software (Martin, 2003; see Figure 1)

Read more

Summary

The C-ORAL family of spoken corpora: a brief review

The C-ORAL-ROM project (Cresti and Moneglia, 2005) aimed at compiling comparable resources for the study of spontaneous speech of some European Romance languages, resulting in the publication of four spoken corpora: the Italian corpus by the LABLITA lab at Florence. All corpora were planned as effective tools for the study of spoken language: besides the transcripts of the recording sessions, they provide the audio files and the text-to-speech alignment through the WinPitch software (Martin, 2003; see Figure 1). The prosodic information is of paramount importance for the study of spoken language, and prosody is held as a primary means to convey meaning (on many levels: semantic, pragmatic, and so on) in speech. The global architecture of the C-ORAL resources comprises four corpora: the Informal and Formal corpora, Media, and Telephone Following the same theoretical and methodological framework, the more recent CORAL-BRASIL corpora could benefit from technological and methodological advances, as it will be shown throughout the following sections

The C-ORAL-BRASIL Project
C-ORAL-BRASIL I
C-ORAL-BRASIL II
Findings
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.