Abstract

We describe transcription and forced alignment of the Digital Archive of Southern Speech (DASS), a project that will provide a large corpus of historical, semi-spontaneous Southern speech for acoustic analysis. 372 hours of recordings (64 interviews) comprise a subset of the Linguistic Atlas of the Gulf States, an extensive dialect study of 1121 speakers conducted across eight southern U.S. states from 1968 to 1983. Manual orthographic transcription of full DASS interviews is carried out according to in-house guidelines that ensure consistency across files and transcribers. Separate codes are used for the interviewee, interviewer, non-speech, overlapping, and unintelligible speech. Transcriber output is converted to Praat TextGrids using LaBB-CAT, a tool for maintaining large speech corpora. TextGrids containing only the interviewee’s speech are generated, and subjected to forced alignment by DARLA, which accommodates the levels of variation and noise in the DASS files with a high degree of success. Toward acoustic analysis, we evaluate three methods for vowel formant extraction: the native output of DARLA, a local implementation of FAVE-Extract, and a Praat-based extractor that incorporates separate formant tracks for different regions of the vowel space. We present this workflow of transcription and analysis to benefit other projects of similar size and scope.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.