Abstract

Heart Disease (HD) is one of the fatal diseases across the world and takes maximum number of lives if detected late. Therefore, early detection of HD has become essential, so that appropriate treatment can be taken up to save lives. In this paper, the problem of HD detection from heart sound signal by employing imbalanced number of training and testing sets has been undertaken and suitable Machine learning (ML) based detection models have been developed. In addition, various performance metrics have been evaluated and analyzed. A novel ensemble model has also been developed for improved detection of heart murmur. The raw Heart Sound (HS) signals after preprocessing, denoising and features extraction have been fed to the proposed models. Two different features: Discrete wavelet transform (DWT), Mel-Frequency Cepstral Coefficients (MFCC) as well as combination of these two sets of features have been separately used as inputs during the development phase of the proposed models. To assess the consistency in performance of the developed detection models, two standard datasets have been used for training and validation tasks. The classification models used in the current study are Random Forest (RF), k-Nearest Neighbor (k-NN) and Extreme Gradient Boost (XGB). These three machine learning based classifiers have been chosen because of their simplicity and already proven consistent performance in detection of other diseases. The best two performing models (RF and XGB) obtained from the simulation study have been identified and ensembled using Moth Flame Optimization (MFO) based weights selection scheme. Three different performance metrics such as (accuracy, F1-score and Area under the curve (AUC)) have been obtained through exhaustive simulation study and then analyzed. The results demonstrate that from amongst the combined features based three ML models, the RF detection scheme performs the best. This model provides an accuracy of 88.7%, F1 score of 0.89 and AUC of 0.95 for PhysioNet dataset and 86.16%, 0.86 and 0.89, respectively for Pascal CHSE dataset. But, the RF-MFO-XGB ensemble model achieves the best performance using the combined features as input. The performance measures in terms of accuracy, F1 score and AUC values of the ensemble models are 89.53%, 0.9 and 0.95, respectively. However, the corresponding metrics for Pascal CHSE dataset are found to be 87.89%, 0.87 and 0.93, respectively. Thus, the proposed ensemble provides consistent and improved performance. Hence, this model can be applied for detection of other diseases. In addition, the suggested model can be implemented under internet of things (IoT) environment and can be used at remote places.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call