Discrimination of normal and adventitious respiratory sounds from stethoscope auscultation by human ears is challenging owing to low frequency characteristics and varying frequency range for inspiration and expiration. This makes the diagnosis of pulmonary disorders a subjective one relying on the experience and hearing capability of the physician. Most computer assisted diagnosis systems formulated to address these limitations fail to capture the inherent acoustic property of respiratory sounds close to human auditory system. To circumvent this problem, this study exploits the gammatone filter banks to evaluate the distribution of all frequency components present in the signal. To categorize the respiratory signals, GoogLeNet based Convolutional Neural Networks (CNN) prediction model is developed through Time-Frequency (TF) visualization of gammatone cepstral coefficients acquired from decomposed Intrinsic Mode Functions (IMFs) in an empirical manner. As the performance of the CNN model is greatly dependent on the learning environment, this study also tends to optimize the values of the hyper parameters to enhance the classification performance of the CNN model. Accordingly, the optimal values of initial learning rate, L2 regularization and momentum are identified for both Stochastic Gradient Descent Momentum (SGDM) and Adaptive Momentum (ADAM) optimizer using Bayesian optimization technique. Experimentation is carried out for three TF visualization methods viz. spectrograms, scalograms and Constant Q spectrograms. Evaluated results show that scalogram method of classification yields high accuracy of 87.37% and 88.32% for default selection of hyperparameters using SGDM and ADAM optimizers and with optimal selection of hyperparameters based on objective function reveal an improved accuracy of 93.68 % and 95.67% for SGDM and ADAM optimizers.
Read full abstract