Abstract

Phonetic analysis is labor intensive, limiting the amount of data that can be considered. Automated techniques (e.g., forced alignment based on Automatic Speech Recognition, ASR) have recently emerged allowing for larger-scale analysis. While forced alignment can be accurate for adult speech (e.g., Yuan & Liberman, 2009), ASR techniques remain a challenge for child speech (Benzeghiba et al., 2007). We used a trainable forced aligner (Gorman et al., 2011) to examine the effect of four factors on alignment accuracy with child speech: (1) Datasets CHILDES (McWhinney, 2000):—Spontaneous speech (single child)—Picture naming (multiple children, Paidologos data); (2) Phonetic Transcription—Manual—Automatic—CMU dictionary (Weide, 1998); (3) Training data—Adult lab data—one dataset of child data—All child data—Child & adult lab data; (4) Segment—voiceless stops—voiceless sibilants—vowels Automatically generated alignments were compared to hand segmentations. While there were limits on accuracy, in general, better results were obtained with (1) picture naming, (2) manual phonetic transcription, (3) training data including child speech, and (4) voiceless stops. These four factors increase the utility of analyzing children’s speech production using forced alignment, potentially allowing researchers to conduct larger-scale studies that would not otherwise be feasible.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call