Classifying Heart Disease through Fusion of Multi-Source Datasets: Integration of Feature Selection and Explainable Machine Learning Techniques
This study delves into heart disease classification through integrated feature selection and machine learning methodologies, utilizing three datasets comprising 4,728 participants and 11 features, with 4.27% missing data. Employing machine learning, we used XGBoost to achieve 0.95 accuracy for one feature, while Random Forest (RF) demonstrated accuracies of 0.92 and 0.99 for the remaining two features. Comparing 11 classification models, RF and XGBoost classified heart disease with 0.97 and 0.99 accuracy, respectively, using all available features. Applying Feature Elimination with Simultaneous Perturbation Feature Selection and Ranking (SpFSR) revealed that RF attained 0.99 accuracy by selecting only four features (cholesterol level, age, resting electrocardiographic measurements, and maximum heart rate), while XGBoost dropped to 0.91. Constructing an RF model with four features enhanced interpretability without compromising accuracy. Explainable Machine Learning (XAI) techniques, including Permutation Importance and SHAP Summary Plot analyses, gauged feature impact on heart disease prediction. The resting electrocardiographic measurements feature held the highest value (0.40 ± 0.01), followed by maximum heart rate (0.32 ± 0.01), cholesterol level (0.28 ± 0.01), and age (0.26 ± 0.005). These results underscore the significance of each feature in diagnosing heart disease via machine learning.
- Research Article
7
- 10.32604/cmc.2022.026064
- Jan 1, 2022
- Computers, Materials & Continua
Heart disease is one of the leading causes of death in the world today. Prediction of heart disease is a prominent topic in the clinical data processing. To increase patient survival rates, early diagnosis of heart disease is an important field of research in the medical field. There are many studies on the prediction of heart disease, but limited work is done on the selection of features. The selection of features is one of the best techniques for the diagnosis of heart diseases. In this research paper, we find optimal features using the brute-force algorithm, and machine learning techniques are used to improve the accuracy of heart disease prediction. For performance evaluation, accuracy, sensitivity, and specificity are used with split and cross-validation techniques. The results of the proposed technique are evaluated in three different heart disease datasets with a different number of records, and the proposed technique is found to have superior performance. The selection of optimized features generated by the brute force algorithm is used as input to machine learning algorithms such as Support Vector Machine (SVM), Random Forest (RF), K Nearest Neighbor (KNN), and Naive Bayes (NB). The proposed technique achieved 97% accuracy with Naive Bayes through split validation and 95% accuracy with Random Forest through cross-validation. Naive Bayes and Random Forest are found to outperform other classification approaches when accurately evaluated. The results of the proposed technique are compared with the results of the existing study, and the results of the proposed technique are found to be better than other state-of-the-art methods. Therefore, our proposed approach plays an important role in the selection of important features and the automatic detection of heart disease.
- Research Article
- 10.25299/itjrd.2025.17941
- Apr 24, 2025
- IT Journal Research and Development
This study evaluates the performance of three machine learning models—Random Forest, Support Vector Machine (SVM), and Logistic Regression—in predicting heart disease using the "Heart Disease UCI" dataset from Kaggle. The models were assessed based on accuracy, precision, recall, and F1-score, both with and without feature selection techniques such as Chi-Square and Mutual Information.Without feature selection, Random Forest achieved the highest performance with an accuracy of 89.7%, followed by SVM with 87.0%, and Logistic Regression with 84.2%. Using Mutual Information for feature selection, Random Forest achieved an accuracy of 85.3%, SVM 87.0%, and Logistic Regression 82.6%. With Chi-Square feature selection, Random Forest and Logistic Regression both showed an accuracy of 83.2%, while SVM achieved 82.6%.The results indicate that Random Forest consistently performs well across different scenarios, making it a robust choice for heart disease prediction. Feature selection did not significantly enhance model performance, suggesting that the initial features in the dataset are already highly relevant. These findings highlight the potential of machine learning, especially Random Forest, in aiding clinical diagnosis of heart disease. Further research is needed to validate these models on larger, more diverse datasets and to explore advanced feature selection techniques for improved model performance.
- Conference Article
6
- 10.5220/0008381505080515
- Jan 1, 2019
Machine Learning (ML) is transforming the industries from delivering normal products to deliver intellect products. Large sets of data points are analysed by the computers and the relationship modelling is applied in a predictive way in real time to obtain accurate results. Machine Learning is adopted in healthcare problems for increasing efficiencies, saving money, and saving lives. The cost of medical treatment is reduced and the healthcare processes are optimized throughout the organization with the support of ML. ML improves healthcare delivery and patient health. Machine learning improves diagnosis and treatment options, also empowers individuals to take control of their health. Diagnosis advancements, predictive healthcare, medicines, and helping patients through ML interface produces better results. Heart Disease relates to many numbers of medical complications related to the heart. In recent years, ML has spread its knowledge in every field. In healthcare, the usage of ML has been significantly increased. This research work aims at the prediction of heart disease and classification of heart disease using Machine Learning algorithms. The experimental results are classified into five heart disease stages using values 0, 1, 2, 3, and 4, value 0 for no heart disease and 4 for severe heart disease. The Area Under the Curve (AUC) values depict the accuracy level of the prediction using this proposed model. The results are displayed using the data set in the form of charts that is easy to analyse the number of people having chest pains. The ML analytical report added up in the form of charts or other visuals, the results are reported informatively. This analysis is helpful for doctors and the medical industry for several case studies.
- Research Article
8
- 10.1016/j.heliyon.2024.e38731
- Oct 1, 2024
- Heliyon
Early heart disease prediction using feature engineering and machine learning algorithms
- Research Article
1
- 10.53759/7669/jmc202303048
- Oct 5, 2023
- Journal of Machine and Computing
Health care Management System (HMS) is a key to successful management of any health care industry. Health care management systems have so many research dimensions such as identifying disease and diagnostic, drug discovery manufacturing, Bioinformatics’ problem, personalized treatments, Patient image analysis and so on. Heart Disease Prediction (HDP) is a process of identifying heart disease in advance and recognizes patient health condition by applying techniques on patient heart related symptoms. Now a day’s the problem of identifying heart diseases is solved by machine learning techniques. In this paper we construct a heart disease prediction method using combined feature selection and classification machine learning techniques. According to the existing study the one of the main difficult in heart disease prediction system is that the available data in open sources are not properly recorded the necessary characteristics and there is some lagging in finding the useful features from the available features. The process of removing inappropriate features from an available feature set while preserving sufficient classification accuracy is known as feature selection. A methodology is proposed in this paper that consists of two phases: Phase one employs two broad categories of feature selection techniques to identify the efficient feature sets and it is given to the input of our second phase such as classification. In this work we will concentrate on filter-based method for feature selection such as Chi-square, Fast Correlation Based Filter (FCBF), Gini Index (GI), RelifeF, and wrapper-based method for feature selection such as Backward Feature Elimination (BFE), Exhaustive Feature Selection (EFS), Forward Feature Selection (FFS), and Recursive Feature Elimination (RFE). The UCI heart disease data set is used to evaluate the output in this study. Finally, the proposed system's performance is validated by various experiments setups.
- Research Article
- 10.58414/scientifictemper.2024.15.spl.36
- Oct 16, 2024
- The Scientific Temper
Heart disease remains a leading cause of mortality worldwide, emphasizing the urgent need for effective classification and prediction methodologies. This literature review explores various data mining and machine learning approaches utilized in the classification and prediction of heart disease. We systematically analyze a diverse range of techniques, including decision trees, support vector machines, artificial neural networks, and ensemble methods, highlighting their strengths and limitations. The review further examines pre-processing methods, feature selection, and extraction techniques that significantly impact model performance. Additionally, we discuss the integration of hybrid approaches and deep learning methods, showcasing their potential to enhance predictive accuracy. Recent advancements in data handling and algorithmic efficiency are also highlighted, demonstrating the promising role of machine learning in addressing the complexities of heart disease diagnosis. This review aims to provide a comprehensive understanding of current trends and future directions in heart disease classification and prediction, paving the way for improved diagnostic tools and health outcomes.
- Research Article
18
- 10.11591/ijece.v13i2.pp2177-2185
- Apr 1, 2023
- International Journal of Electrical and Computer Engineering (IJECE)
<span lang="EN-US">Artificial intelligence is a science that is growing at a tremendous speed every day and has become an essential part of many domains, including the medical domain. Therefore, countless artificial intelligence applications can be seen in the medical domain at various levels, which are employed to enhance early diagnosis and prediction and reduce the risks associated with many diseases, including heart diseases. In this article, machine learning techniques (logistic regression, random forest, artificial neural network, support vector machines, and k-nearest neighbors) are utilized to diagnose heart disease from the Cleveland Clinic dataset got from the University of California Irvine machine learning (UCL) repository and Kaggle platform then create a comparison between the performance of these techniques. In addition, some literature related to machine learning and deep learning techniques that aim to provide reasonable solutions in monitoring, detecting, diagnosing, and predicting heart disease and how these technologies assist in making health decisions are reviewed. Ten studies are selected and summarized by the authors published between 2017 and 2022 are illustrated. After executing a series of tests, it is seen that the most profitable performance in diagnosing heart disease is the support vector machines, with a diagnostic accuracy of 96%. This article has concluded that these techniques play a significant and influential role in assisting physicians and health care workers in analyzing heart patients' data, making health decisions, and saving patients' lives.</span>
- Research Article
- 10.55041/ijsrem50624
- Jun 16, 2025
- INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
Heart disease (HD), including heart attacks, is a leading cause of death worldwide, making accurate determination of a patient's risk a significant challenge in medical data analysis. Early detection and continuous monitoring by physicians can significantly reduce mortality rates, but heart disease is not always easily detectable, and physicians cannot monitor patients around the clock. Machine learning (ML) offers a promising solution to enhance diagnostics through more accurate predictions based on data from healthcare sectors globally. This study aims to employ various feature selection methods to develop an effective ML technique for early-stage heart disease prediction. The feature selection process utilized three distinct methods: chi-square, analysis of variance (ANOVA), and mutual information (MI), leading to three selected feature groups designated as SF-1, SF-2, and SF-3. We then evaluated ten different ML classifiers, including Naive Bayes, support vector machine (SVM), voting, XGBoost, AdaBoost, bagging, decision tree (DT), K-nearest neighbor (KNN), random forest (RF), and logistic regression (LR), to identify the best approach and feature subset. The proposed prediction method was validated using a private dataset, a publicly available dataset, and multiple cross-validation techniques. Keywords: Machine Learning (ML), Heart Disease Classification, Predictive Modeling, Cardiovascular Disease (CVD), Classification Algorithms
- Book Chapter
2
- 10.1007/978-981-16-6460-1_51
- Jan 1, 2022
Early diagnosis of cardiovascular diseases is an extreme necessity in today’s world since cardiovascular diseases are the most prevalent reason for an increase in mortality rate. Computer-based prediction methods such as Machine learning techniques, Artificial intelligence, and other latest technologies are used to analyze the clinical data for early diagnosis of disease. Heart disease prediction model is developed with clinical data using machine learning techniques. Clinical data is generated in the healthcare industry from various sources like electronic health record, sensor data, hospital data, IoT and social media data. Because of its variety of sources, the healthcare data may consist of irrelevant, noisy, redundant data and also consists of a large number of features. All these have a significant impact on prediction model accuracy. The selection of right features is the main step in developing prediction models using machine learning algorithms. Feature selection is the method of removing redundant, noisy, and inappropriate data. Feature selection process selects the important features from the clinical dataset for developing machine learning models. Feature selection approach improves the correctness and efficiency of machine learning prediction models. This research paper deliberates about the various feature selection methods for selecting significant attributes and for eliminating inappropriate attributes in the dataset. Wrapper, Filter, and Embedded methods are analyzed and implemented using the Kaggle heart disease dataset in Python to find the major risk factor of heart disease. The objective of this research article is to find the major risk factor for heart disease. From the implementation of feature selection techniques, we found that chest pain, maximum heart rate, ST depression induced by exercise relative to rest, number of major vessels colored by fluoroscopy, and exercise-induced angina as the important risk factor for heart disease.KeywordsHeart diseaseFeature selectionOptimization techniquesMachine learningElectronic health recordArtificial intelligenceHealthcare
- Research Article
- 10.55041/ijsrem29638
- Mar 25, 2024
- INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
Machine Learning (ML) is increasingly applied in various sectors globally, and the healthcare sector is no exception. In particular, ML can significantly contribute to the early detection of locomotor disorders and heart diseases. Timely predictions can offer valuable insights to physicians, enabling them to tailor their diagnostic and treatment strategies for individual patients. This project focuses on the use of ML algorithms to predict the likelihood of heart disease in individuals. It involves a comparative analysis of several classifiers, including decision trees, Naïve Bayes, Logistic Regression, SVM, and Random Forest. Furthermore, the project introduces an ensemble classifier that combines the strengths of both robust and less robust classifiers. This approach allows for the utilization of numerous samples for training and validation purposes. We analyze both existing classifiers and proposed classifiers like AdaBoost and XGBoost, aiming to enhance accuracy and predictive capabilities.Heart disease remains a significant concern worldwide, and early detection is crucial for preventing severe outcomes and enhancing patient care. Machine learning techniques have shown promise in increasing the accuracy of heart disease predictions. This paper discusses the use of ML in predicting heart disease, emphasizing its potential advantages and the challenges encountered. The main goal of this research is to assess the performance of various ML algorithms in predicting heart disease risk based on patient data. The dataset comprises a wide array of variables, including age, gender, blood pressure, cholesterol levels, exercise patterns, and medical history. After preprocessing these variables, we train and test several ML models, such as Logistic Regression, Random Forest, and Support Vector Machines (SVM), to evaluate their effectiveness.
- Research Article
3
- 10.14569/ijacsa.2022.0130965
- Jan 1, 2022
- International Journal of Advanced Computer Science and Applications
There is the continuous increase in death rate related to cardiac disease across the world. Prediction of the heart disease in advance may help the experts to suggest the pre-emptive measures to minimize the death risk. The early diagnosis of heart disease symptoms is made possible by machine learning technologies. The existing machine learning models are inefficient in terms of simulation error, accuracy and timing for heart disease prediction. Hence, an efficient approach is needed for efficient prediction of heart disease. In the current research paper, a model based on Machine learning techniques has been proposed for early and accurate prediction of heart disease. The proposed model is based on techniques for feature optimization, feature selection, and ensemble learning. Using WEKA 3.8.3 tool, the feature selection and feature optimisation technique has been applied for irrelevant features elimination and then the pragmatic features are tested using ensemble techniques. Further, the comparison of the proposed model is made with the existing model without feature selection and feature optimisation technique in terms of heart disease prediction effectiveness. It is found that the results of proposed model gives the better performance in terms of simulation error, response time and accuracy in heart disease prediction.
- Research Article
- 10.17762/turcomat.v12i9.3697
- May 10, 2021
To predict and diagnose heart disease various methods based on machine learning were presented. Before occurrence of heart attack, to treat cardiac patients, it is significant to accurate heart disease prediction. Existing methods failed to improve performance of heart disease prediction and use conventional method to choose features from dataset. In this paper, proposed for heart disease prediction feature extraction approaches and classification using ensemble deep learning. First, Feature extraction using SIFT and ALEXANET from the Mask Region-Based Convolutional Neural Network (RCNN) instance segmented image. Second one, Hybrid Classification with the combination of Random forest and Gaussian Navies Bayes to detect the heart attack. Proposed method is calculated with heart disease data and then testing and training data is compared achieves better results. This outcome indicates that our method is more effective for heart attack prediction.
- Research Article
18
- 10.2174/1872212113666190328220514
- Mar 9, 2021
- Recent Patents on Engineering
Background: Diagnosing diseases is an intricate job in medical field. Machine learning when applied to health care is capable of early detection of disease which would aid to provide early medical intervention. In heart disease prediction, machine learning techniques have played a significant role. Analysis of disease has become vital in health care sectors. The massive data collected by healthcare sectors are preprocessed and analyzed to discover the underlying information in the data for effective decision making and to provide proper medical intervention. The success of machine learning in medical industry is its capability in analyzing the huge amount of data gathered by the health sector and its effectiveness in decision making. Since medical field involves too many manual processes it has become necessary to automate these procedures. Remarkable advancements in electronic medical records have made it possible. Diagnosing diseases is an intricate job in medical field. Objective: The objective of this research is to design a robust machine learning algorithm to predict heart disease. The prediction of heart disease is performed using Ensemble of machine learning algorithms. This is to boost the accuracy achieved by individual machine learning algorithms. Methods: Heart Disease Prediction System is developed where the user can input the patient details and the prediction for the particular patient is made using the model developed. The model will predict the output to be either normal or risky. Linear Discriminant Analysis (LDA), Classification and Regression Trees (CART), Support Vector Machines (SVM), K-Nearest Neighbors (KNN) and Naïve Bayes classifier are used as base learners. These algorithms are combined using random forest as the meta classifier. Results: The predictions of classifier are combined using random forest algorithm. The accuracy is lifted from 85.53 % to 87.64 % which is an impressive improvement on accuracy. Conclusion: Various techniques were adopted to preprocess the data to suite the requirement of analysis. Feature selections were made to optimize the performance of machine learning algorithms. Ensemble prediction gave better accuracy when combined using Random forest algorithm as combiner. Better feature selection techniques can be applied to further improve the accuracy.
- Research Article
21
- 10.3390/ai4040053
- Dec 1, 2023
- AI
Globally, over 17 million people annually die from cardiovascular diseases, with heart disease being the leading cause of mortality in the United States. The ever-increasing volume of data related to heart disease opens up possibilities for employing machine learning (ML) techniques in diagnosing and predicting heart conditions. While applying ML demands a certain level of computer science expertise—often a barrier for healthcare professionals—automated machine learning (AutoML) tools significantly lower this barrier. They enable users to construct the most effective ML models without in-depth technical knowledge. Despite their potential, there has been a lack of research comparing the performance of different AutoML tools on heart disease data. Addressing this gap, our study evaluates three AutoML tools—PyCaret, AutoGluon, and AutoKeras—against three datasets (Cleveland, Hungarian, and a combined dataset). To evaluate the efficacy of AutoML against conventional machine learning methodologies, we crafted ten machine learning models using the standard practices of exploratory data analysis (EDA), data cleansing, feature engineering, and others, utilizing the sklearn library. Our toolkit included an array of models—logistic regression, support vector machines, decision trees, random forest, and various ensemble models. Employing 5-fold cross-validation, these traditionally developed models demonstrated accuracy rates spanning from 55% to 60%. This performance is markedly inferior to that of AutoML tools, indicating the latter’s superior capability in generating predictive models. Among AutoML tools, AutoGluon emerged as the superior tool, consistently achieving accuracy rates between 78% and 86% across the datasets. PyCaret’s performance varied, with accuracy rates from 65% to 83%, indicating a dependency on the nature of the dataset. AutoKeras showed the most fluctuation in performance, with accuracies ranging from 54% to 83%. Our findings suggest that AutoML tools can simplify the generation of robust ML models that potentially surpass those crafted through traditional ML methodologies. However, we must also consider the limitations of AutoML tools and explore strategies to overcome them. The successful deployment of high-performance ML models designed via AutoML could revolutionize the treatment and prevention of heart disease globally, significantly impacting patient care.
- Research Article
57
- 10.1088/1757-899x/1022/1/012046
- Jan 1, 2021
- IOP Conference Series: Materials Science and Engineering
Machine Learning (ML), which is one of the most prominent applications of Artificial Intelligence, is doing wonders in the research field of study. In this paper machine learning is used in detecting if a person has a heart disease or not. A lot of people suffer from cardiovascular diseases (CVDs), which even cost people their lives all around the world. Machine learning can be used to detect whether a person is suffering from a cardiovascular disease by considering certain attributes like chest pain, cholesterol level, age of the person and some other attributes. Classification algorithms based on supervised learning which is a type of machine learning can make diagnoses of cardiovascular diseases easy. Algorithms like K-Nearest Neighbor (KNN), Random Forest are used to classify people who have a heart disease from people who do not. Two supervised machine learning algorithms are used in this paper which are, K-Nearest Neighbor (K-NN) and Random Forest. The prediction accuracy obtained by K-Nearest Neighbor (K-NN) is 86.885% and the prediction accuracy obtained by Random Forest algorithm is 81.967%.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.