Abstract

Daily interactions of children with their parents are crucial for spoken language skills and overall development. Capturing such interactions can help to provide meaningful feedback to parents as well as practitioners. Naturalistic audio capture and developing further speech processing pipeline for parent-child interactions is a challenging problem. One of the first important steps in the speech processing pipeline is Speaker Diarization—to identify who spoke when. Speaker Diarization is the method of separating a captured audio stream into analogous segments that are differentiated by the speaker’s (child or parent’s) identity. Following ongoing COVID-19 restrictions and human subjects research IRB protocols, an unsupervised data collection approach was formulated to collect parent-child interactions (of consented families) using LENA device—a light weight audio recorder. Different interaction scenarios were explored: book reading activity at home and spontaneous interactions in a science museum. To identify child’s speech from a parent, we train the Diarization models on open-source adult speech data and children speech data acquired from LDC (Linguistic Data Consortium). Various speaker embeddings (e.g., x-vectors, i-vectors, resnets) will be explored. Results will be reported using Diarization Error Rate. [Work sponsored by NSF via Grant Nos. 1918032 and 1918012.]

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call