Annotating large speech corpora: building on the experience of Marsec

Gerry Knowles

doi:10.7146/hjlcb.v7i13.25076

Annotating large speech corpora: building on the experience of Marsec

Gerry Knowles

Open Access

https://doi.org/10.7146/hjlcb.v7i13.25076

Copy DOI

Journal: HERMES - Journal of Language and Communication in Business	Publication Date: Jan 4, 2017
License type: cc-by

#Large Amounts Of Speech Data #Database Techniques + Show 8 more

Abstract
Full-Text
Similar Papers

Abstract

This paper discusses a methodology for the processing of large amounts of speech data using database techniques and applying the lessons learned in the compilation of the Marsec database. The methodology is offered as an alternative to the conventional method of processing the orthographic transcription using only techniques designed for written texts. It is argued that while according to past practice it might appear that the first step in processing spoken texts is to make phonemic and prosodic transcriptions, these are not in reality necessary. Given the appropriate organisation of the data, much of the information in conventional transcriptions is predictable, and human expertise is required only to add unpredictable supplementary annotations.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

More From: HERMES - Journal of Language and Communication in Business

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.