Feature-Based Fusion Using CNN for Lung and Heart Sound Classification.

Zeenat Tariq,Yugyung Lee,Sayed Khushal Shah

doi:10.3390/s22041521

Zeenat Tariq, Yugyung Lee + Show 1 more

Open Access

https://doi.org/10.3390/s22041521

Copy DOI

Abstract

Lung or heart sound classification is challenging due to the complex nature of audio data, its dynamic properties of time, and frequency domains. It is also very difficult to detect lung or heart conditions with small amounts of data or unbalanced and high noise in data. Furthermore, the quality of data is a considerable pitfall for improving the performance of deep learning. In this paper, we propose a novel feature-based fusion network called FDC-FS for classifying heart and lung sounds. The FDC-FS framework aims to effectively transfer learning from three different deep neural network models built from audio datasets. The innovation of the proposed transfer learning relies on the transformation from audio data to image vectors and from three specific models to one fused model that would be more suitable for deep learning. We used two publicly available datasets for this study, i.e., lung sound data from ICHBI 2017 challenge and heart challenge data. We applied data augmentation techniques, such as noise distortion, pitch shift, and time stretching, dealing with some data issues in these datasets. Importantly, we extracted three unique features from the audio samples, i.e., Spectrogram, MFCC, and Chromagram. Finally, we built a fusion of three optimal convolutional neural network models by feeding the image feature vectors transformed from audio features. We confirmed the superiority of the proposed fusion model compared to the state-of-the-art works. The highest accuracy we achieved with FDC-FS is 99.1% with Spectrogram-based lung sound classification while 97% for Spectrogram and Chromagram based heart sound classification.

Highlights

Lung or heart sound classification is challenging due to the complex nature of audio data, its dynamic properties of time, and frequency domains
The highest accuracy is achieved by Spectrogram 97%, while Mel-frequency cepstral coefficient (MFCC) reported accuracy of 91% and Chromagram reported accuracy of 95%
Chromagram was the accuracy of 89% with FDC-1 and FDC-2

Summary

Introduction

Lung or heart sound classification is challenging due to the complex nature of audio data, its dynamic properties of time, and frequency domains. It is very difficult to detect lung or heart conditions with small amounts of data or unbalanced and high noise in data. We propose a novel feature-based fusion network called FDC-FS for classifying heart and lung sounds. The FDC-FS framework aims to effectively transfer learning from three different deep neural network models built from audio datasets. We built a fusion of three optimal convolutional neural network models by feeding the image feature vectors transformed from audio features. An auscultatory method has been applied widely by physicians to examine lung sounds associated with different respiratory symptoms. Wheezing sounds could not accurately be identified in a series of the pulmonary disease sounds [6]

Results

Discussion

Conclusion