Abstract

Physical exertion is a stress condition that affects how we normally produce speech. It alters both the temporal and spectral pattern of speech characteristics. Therefore, speech utterances can be used as a cost-effective telehealth solution to detect whether a person is under physical exertion. This paper deals with the detection of shortness of breath (or out-of-breath) condition from speech under physical load by using an multi-task learning (MTL) framework. The primary classification targets are the neutral and the out-of-breath classes. Naturally these are binary targets and do not reflect the actual extent of exertion. This leads to the creation of a novel auxiliary target learning in collaboration with a pre-trained expert system. The targets here indicate the level of exertion under physical load. The MTL framework for both the convolutional neural network (CNN) and CNN-long short-term memory (CLSTM) network performs nearly 1.5% (F1-score) better than the single task learning (STL) framework. In addition to that, out-of-breath speech has more influence on the lower frequency spectrum. Warped spectrograms are given as input to the networks, enabling the deep networks to focus on lower spectral regions. The non-linear frequency warping is achieved by Mel-scale transformation and constant-Q-transform (CQT). CQT, being less dependent on fixed-window size, shows at least 6.57% (F1-score) improvement over Mel-based inputs. The MTL framework, combined with the warped spectrum, performs better in classifying out-of-breath speech from neutral.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call