The best input feature when using convolutional neural network for cough recognition

Yunan Cai,Wenlong Xu

doi:10.1088/1742-6596/1865/4/042111

Abstract

In recent years, the use of convolutional neural networks has been successful in the task of cough recognition. This method mainly converts audio clips into the form of spectrograms, and then uses convolutional neural networks for classification, which prompts us to seek better input representation For more effective training to achieve better performance, in this article, we use STFT, Mel spectrogram, Log-Mel spectrogram and MFCC four different features as the input of the convolutional neural network, in the case of the same parameters Compare their performance below, where the Mel spectrum is used as input to achieve 92.5% classification accuracy. Secondly, we compared the classification performance of different Mel spectra generated by different window sizes and frame shifts under the same convolutional neural network. For a 320ms cough segment, the window size is 64, and the frame shift is 32, which has the best performance

Full Text