Automated detection of sigmatism using deep learning applied to multichannel speech signal

Michal Krecichwost,Natalia Mocko,Pawel Badura

doi:10.1016/j.bspc.2021.102612

Abstract

This paper presents a system for the analysis of acoustic data for the computer-aided diagnosis and therapy of sigmatism in children. The analysis is focused on the detection and recognition of selected articulation disorders in sibilant sounds. The system relies on the dedicated data acquisition device recording the speech signal using 15 microphones spatially arranged around the speaker's mouth. The collected speech corpus contains 923 samples of the /s/ and /ʃ/ consonants from 98 five- and six-year-old children with either normative or pathological pronunciation features. Each recording is supplemented with a detailed speech therapy annotation. A dedicated multibranch convolutional neural network architecture was designed for the speech sample classification. The filter bank energy feature maps are extracted from each channel along with their two derivatives in the time domain. The feature maps are aggregated along different dimensions to constitute a four-dimensional data structure called acoustic volume, being the input data for the deep network. We proposed three ways to aggregate the multichannel data into the acoustic volume and two techniques for the data augmentation to enlarge the available dataset and avoid overfitting. Classification experiments involving different data subsets have proven the system's ability to detect the analyzed pronunciation disorders with reasonable accuracy. The framework with speech data organized spatially in five channels provides the most efficient classification.

Full Text