Automatic initial and final segmentation in cleft palate speech of Mandarin speakers.

Ling He,Jing Zhang,Jiang Zhang,Yin Liu,Heng Yin,Junpeng Zhang,Philip Allen

doi:10.1371/journal.pone.0184267

Ling He, Jing Zhang + Show 5 more

Open Access

https://doi.org/10.1371/journal.pone.0184267

Copy DOI

Abstract

The speech unit segmentation is an important pre-processing step in the analysis of cleft palate speech. In Mandarin, one syllable is composed of two parts: initial and final. In cleft palate speech, the resonance disorders occur at the finals and the voiced initials, while the articulation disorders occur at the unvoiced initials. Thus, the initials and finals are the minimum speech units, which could reflect the characteristics of cleft palate speech disorders. In this work, an automatic initial/final segmentation method is proposed. It is an important preprocessing step in cleft palate speech signal processing. The tested cleft palate speech utterances are collected from the Cleft Palate Speech Treatment Center in the Hospital of Stomatology, Sichuan University, which has the largest cleft palate patients in China. The cleft palate speech data includes 824 speech segments, and the control samples contain 228 speech segments. The syllables are extracted from the speech utterances firstly. The proposed syllable extraction method avoids the training stage, and achieves a good performance for both voiced and unvoiced speech. Then, the syllables are classified into with “quasi-unvoiced” or with “quasi-voiced” initials. Respective initial/final segmentation methods are proposed to these two types of syllables. Moreover, a two-step segmentation method is proposed. The rough locations of syllable and initial/final boundaries are refined in the second segmentation step, in order to improve the robustness of segmentation accuracy. The experiments show that the initial/final segmentation accuracies for syllables with quasi-unvoiced initials are higher than quasi-voiced initials. For the cleft palate speech, the mean time error is 4.4ms for syllables with quasi-unvoiced initials, and 25.7ms for syllables with quasi-voiced initials, and the correct segmentation accuracy P30 for all the syllables is 91.69%. For the control samples, P30 for all the syllables is 91.24%.

Highlights

Cleft Palate (CP) is a common congenital malformation caused by craniofacial alternation
The I/F segmentation is implemented in two steps: syllable segmentation and I/F segmentation
To achieve the I/F segmentation in cleft palate speech in this work, considering that some initials are very short, the time duration of a speech frame is chosen shorter than usual frame length to obtain more accurate I/F boundary locations

Summary

Introduction

Cleft Palate (CP) is a common congenital malformation caused by craniofacial alternation. To achieve the I/F segmentation in cleft palate speech in this work, considering that some initials are very short, the time duration of a speech frame is chosen shorter than usual frame length to obtain more accurate I/F boundary locations. Automatic initial and final segmentation in Mandarin cleft palate speech The proposed system contains two main procedures: syllable extraction and I/F segmentation. For the syllables with quasi-voiced initials, the segmentation method is based on short-time autocorrelation and waveform shape difference between initials and finals. A two-step segmentation method is proposed to get I/F boundaries for syllables with quasi-unvoiced initials: locating the rough I/F boundaries and I/F boundaries refinement.

Experiments and results

Findings

Conclusions and discussions