The advantages of non-invasive, real-time and convenient, computer audition-based heart sound abnormality detection methods have increasingly attracted efforts among the community of cardiovascular diseases. Time–frequency analyses are crucial for computer audition-based applications. However, a comprehensive investigation on discovering an optimised way for extracting time–frequency representations from heart sounds is lacking until now. To this end, we propose a comprehensive investigation on time–frequency methods for analysing the heart sound, i.e., short-time Fourier transformation, Log-Mel transformation, Hilbert–Huang transformation, wavelet transformation, Mel transformation, and Stockwell transformation. The time–frequency representations are automatically learnt via pre-trained deep convolutional neural networks. Considering the urgent need of smart stethoscopes for high robust detection algorithms in real environment, the training, verification, and testing sets employed in the extensive evaluation are subject-independent. Finally, to further understand the heart sound-based digital phenotype for cardiovascular diseases, explainable artificial intelligence approaches are used to reveal the reasons for the performance differences of four time–frequency representations in heart sound abnormality detection. Experimental results show that Stockwell transformation can beat other methods by reaching the highest overall score of 65.2%. The interpretable results demonstrate that Stockwell transformation does not only present more information for heart sounds, but also provides a certain noise robustness. Besides, the considered fine-tuned deep model brings an improvement of the mean accuracy over the previous state-of-the-art results by 9.0% in subject-independent testing.
Read full abstract