Abstract

Electroglottograph (EGG) is a device used to measure the conductance between the vocal folds. The analysis of EGG signal has many applications in the literature such as speech-to-text synthesis, voice disorder analysis, emotion recognition, speaker verification, etc. Therefore, the EGG device is essential to record the vocal folds activity. Alternatively, a new method is proposed in this work to synthesize the EGG waveform from speech signal using a context aggregation convolutional neural network. The synthesis network is trained by accounting the deep feature losses obtained by comparing it with another network called the EGG classification network. The synthesized EGG signal needs to be characterized. During the voiced speech production, the instants at which the vocal folds attain complete closure are called glottal closure instants (GCIs). Likewise, the opening instants are called glottal opening instants (GOIs). Such instants are reliably measured using the EGG signal. The performance of the proposed method is compared with other state-of-the-art techniques. The CMU-Arctic database has a parallel corpus of speech and EGG signal recorded simultaneously. This database is used for training the synthesis network and for comparison purposes. It is found that the performance of extracting glottal instants from synthesized EGG signals is comparable to other methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call