Infant cry classification by using different deep neural network models and hand-crafted features

Turgut Ozseven

doi:10.1016/j.bspc.2023.104648

Abstract

Crying is the way babies communicate with the outside world. These cries may be related to the needs of the baby or maybe an expression of a medical disorder. For this reason, infant cries are examined to support inexperienced parents and to make an early diagnosis if there is a medical disorder. Infant cry signals are classified using signal processing methods such as hand-crafted features or image processing methods based on the spectral image of the cry. In this study, we investigate the effect of using hand-crafted features and spectral images individually and hybrid in the classification of infant cries. In this context, experiments were conducted with the 1D CNN model, transfer learning, texture analysis methods, hand-crafted features, and their combination. In addition, the number of classes used in most of the studies in the literature is two or three, whereas in this study 5-classes in the dataset are used. Classification with hand-crafted and hybrid features was performed with SVM, RNN, and PNN. In addition, hand-crafted features were also classified with 1D CNN. GoogLeNet, ShuffleNet, ResNet-18 were used for transfer learning in image-based classification. The results show that texture analysis methods are insufficient and that hand-crafted feature sets and spectrogram and scalogram images provide high success. In addition, the 1D CNN model showed lower success than traditional classifiers and transfer learning models. Especially on DB2, the lowest result was obtained with 1D CNN in all experiments. The results were compared on two data commonly used in the literature, and a success rate of 97.6% and 95.2% was achieved. These results show that both signal processing methods and spectrogram and scalogram images can be used successfully in infant cry classification studies.

Full Text