A penalty-embedded genetic algorithm for interpretable clinical feature selection in heart disease prediction
Heart disease decision support often requires identifying a small set of routine clinical tests that preserves reliable multi-class predictive performance while reducing diagnostic burden. However, many metaheuristic wrapper studies emphasize accuracy and rely on implicit or post hoc size control, which can yield larger panels under comparable computational budgets. This work addresses this gap by proposing GA-CombinedFitness, a penalty-embedded genetic algorithm that optimizes a single combined objective of five-fold cross-validated logistic-regression accuracy and subset size under a primary configuration. The approach was evaluated on a harmonized multi-cohort dataset of 920 patients with eight routinely collected predictors and benchmarked against seven alternative search techniques under a controlled protocol with matched iteration and cross-validation settings. GA-CombinedFitness achieved the highest mean combined fitness (CF=0.4805) while selecting a single-feature subset (cp, chest-pain type) with mean accuracy of 0.5478, mean balanced accuracy of 0.2981, and mean macro-F1 of 0.2506. Accuracy-driven methods achieved higher accuracy and macro-F1 on average but selected larger subsets, which reduced their combined-fitness scores under the stated objective. Post hoc SHAP, PDP+ICE, and ALE analyses were used as plausibility checks of model behavior under the evaluation protocol and of the predictors emphasized by the combined objective. These results indicate that embedding a parsimony penalty within genetic search can improve the accuracy–cost trade-off and yield compact, interpretable panels for cost-constrained heart disease workflows.
- Research Article
23
- 10.1007/s10586-017-1530-z
- Jan 6, 2018
- Cluster Computing
In healthcare, there are vast areas wherein the prediction and analysis have been carried out for all the disease. Nowadays, the most common disease for the human being under risk is of cardiac disease. The idea here is to analyze the data set of variety of patients and to predict the chance of getting the heart attack is due to high blood pressure, the gene in the family circle, age factor. Among all the other disease, heart disease is the hazardous disease which leads to death. The heart attack occurs when the flow of blood to the heart is blocked, which contains fat, cholesterol and other substances in the arteries that feed the heart. The heart attack is the permanent damage or destroys part of the heart muscle; it creates a permanent scar on the heart. The proposed approach extracts the features from the dataset. Based on the features the decision table is constructed. Irrelevant attributes are removed by applying feature selection algorithm. Further, the dependency among the attribute towards identifying the disease is determined by using optimality criterion function. Hence the time taken to predict the heart disease is reduced compared to other algorithms. The dataset is collected from UCI and analyzed using the Optimality Criterion Feature selection algorithm. There are 14 attributes in the dataset, and data such as resting electrocardiography, chest pain type are the three attributes taken into consideration for making decisions.
- Research Article
2
- 10.11591/ijict.v14i3.pp751-759
- Dec 1, 2025
- International Journal of Informatics and Communication Technology (IJ-ICT)
People with symptoms like diabetes, high BP, and high cholesterol are at an increased risk for heart disease and stroke as they get older. To mitigate this threat, predictive fashions leveraging machine learning (ML) and artificial intelligence (AI) have emerged as a precious gear; however, heart disease prediction is a complicated task, and diagnosis outcomes are hardly ever accurate. Currently, the existing ML tech says it is necessary to have data in certain centralized locations to detect heart disease, as data can be found centrally and is easily accessible. This review introduces federated learning (FL) to answer data privacy challenges in heart disease prediction. FL, a collaborative technique pioneered by Google, trains algorithms across independent sessions using local datasets. This paper investigates recent ML methods and databases for predicting cardiovascular disease (heart attack). Previous research explores algorithms like region-based convolutional neural network (RCNN), convolutional neural network (CNN), and federated logistic regressions (FLRs) for heart and other disease prediction. FL allows the training of a collaborative model while keeping patient info spread out among various sites, ensuring privacy and security. This paper explores the efficacy of FL, a collaborative technique, in enhancing the accuracy of cardiovascular disease (CVD) prediction models while preserving data privacy across distributed datasets.
- Research Article
- 10.54254/2755-2721/2025.17852
- Nov 22, 2024
- Applied and Computational Engineering
Heart disease has long been a major threat to human health, causing about one-third of all deaths each year. Therefore, there is a need for accurate and effective prediction of cardiac disease using machine learning techniques. This paper uses data from 1,319 patients with heart disease and applies several machine-learning methods to study the relationship between a total of eight factors. The algorithms used in this paper include neural networks, decision trees, random forests, and so on. The best model was established by the random forest method, with an accuracy of 96.97%, precision of 97.03%, recall of 96.97%, and f1-Score of 0.9. It was found that troponin and CK-MB indicators had the highest influence weights, and the sum of the weights of each model accounted for more than 75% of the total weight, which demonstrated their significance in the prediction of heart disease, and the results could be utilized for the future prediction of heart disease. In addition, it also plays an essential role in the prediction of heart disease. To sum up, this is really important for the prediction of future heart disease.
- Research Article
7
- 10.9734/ajrcos/2024/v17i5445
- Mar 11, 2024
- Asian Journal of Research in Computer Science
In this study, we delve into the pivotal role of dimension reduction techniques in influencing the performance of machine learning algorithms for heart disease prediction. Through a comprehensive exploration of a dataset encompassing crucial features such as age, sex, chest pain type, blood pressure, cholesterol levels, and more, we investigate the impact of different techniques—namely, Principal Component Analysis (PCA), Kernel Principal Component Analysis (KPCA), and Linear Discriminant Analysis (LDA) on classification algorithm effectiveness. The classification algorithms considered were Logistic Regression, Support Vector Machine (SVM), k-Nearest Neighbors (KNN), Naive Bayes, and Deep Neural Network (DNN). We used K-fold cross validation to train and validate the classification algorithms. The performance of these algorithms was assessed using a range of key metrics including accuracy, F1-score, precision, recall, and specificity. The results reveals that Linear Discriminant Analysis consistently emerged as a potent method, remarkably enhancing algorithm performance across all assessed metrics. We also identified Naive Bayes and Logistic Regression as standout algorithms, demonstrating remarkable resilience and reliability across diverse scenarios. These findings collectively shed light on the intricate interplay between dimension reduction techniques and algorithm selection, offering critical insights for crafting more accurate and robust strategies in the prediction of heart disease.
- Research Article
- 10.54097/hset.v54i.9755
- Jul 4, 2023
- Highlights in Science Engineering and Technology
Although previous studies have shown that there are differences in heart disease between men and women, the importance of some specific physical and chemical factors in the prediction of heart disease in different genders has not been clearly clarified. In this research, K-means clustering, multiple linear regression, logistic regression and random forest are adopted to analyze the UCI Heart Disease Data Set, which contains various physical and chemical indicators worth studying. The results demonstrate that exercise induced angina is more significant to the judgement of heart disease in women, while number of major vessels colored by fluoroscopy is more significant to the judgement of heart disease in men and type of chest pain is a statistically significant variable for both men and women. Thalassemia, ST depression induced by exercise relative to rest, greatest number of heartbeats per minute, age, resting blood pressure also have reference value for the judgment of heart disease. In terms of each model's fit to heart disease prediction, for women, the accuracy of random forest is the first, logistic regression is the second, and multiple linear regression is the third, while for men, the accuracy of random forest is the first, multiple linear regression is the second, and logistic regression is the third. These conclusions are an optimization of previous studies, and to a certain extent reflect that this study is of great significance to the prevention of heart disease in different groups of people.
- Research Article
- 10.26714/jsunimus.11.2.2023.44-50
- Nov 30, 2023
- Jurnal Statistika Universitas Muhammadiyah Semarang
Heart disease is the main cause of death in humans. Even though preventive measures have been taken such as regulating food (diet), lowering cholesterol, and treating weight, diabetes, and hypertension, heart disease remains a major health problem. There are several factors that cause heart disease, including age, type of chest pain, high blood pressure, sugar levels, ECG test values, maximum heart rate, and induced angina. To reduce the percentage of deaths due to heart disease, we need a system that can predict heart disease. The algorithm used in this research is a combination of the Backward Elimination and Naive Bayes algorithms to increase accuracy in diagnosing heart disease. According to the results of this research, the Naive Bayes algorithm has an accuracy value of 78.90% and an Area Under Curve (AUC) value of 0.86, which is included in the good classification category. Combining the Backward Elimination and Naïve Bayes algorithms has an accuracy value of 82.31% and an Area Under Curve (AUC) value of 0.88.
- Research Article
36
- 10.1016/j.cie.2021.107651
- Sep 1, 2021
- Computers & Industrial Engineering
Evolutionary algorithm-based convolutional neural network for predicting heart diseases
- Research Article
4
- 10.52152/spr/2021.137
- Aug 15, 2021
- Science Progress and Research
Several researchers have developed intelligent medical devices to support the systems and further to enhance the ability to diagnose and predict heart diseases. However, there are few studies that look at the capabilities of ensemble methods in developing a heart disease detection and prediction model. In this study, the researchers assessed that how to use ensemble model, which proposes a more stable performance than the use of base learning algorithm and these leads to better results than other heart disease prediction models. The University of California, Irvine (UCI) Machine Learning Repository archive was used to extract patient heart disease data records. To achieve the aim of this study, the researcher developed the meta-algorithm. The ensemble model is a superior solution in terms of high predictive accuracy and diagnostics output reliability, as per the results of the experiments. An ensemble heart disease prediction model is also presented in this work as a valuable, cost-effective, and timely predictive option with a user-friendly graphical user interface that is scalable and expandable. From the finding, the researcher suggests that Bagging is the best ensemble classifier to be adopted as the extended algorithm that has the high prediction probability score in the implementation of heart disease prediction.
- Research Article
7
- 10.1155/acis/1989813
- Jan 1, 2025
- Applied Computational Intelligence and Soft Computing
Heart disease stands as a leading cause of morbidity and mortality globally, presenting a significant public health challenge. Therefore, early prediction and detection are critical, leading to timely and appropriate interventions at early stages. Four ensemble tree‐based algorithms were used in this study: adaptive boosting, extreme gradient boosting, random forest, and extremely randomized trees, investigating their ability to predict heart disease. Data related to heart disease clinical features was obtained from the open Kaggle Machine Learning Dataset repository. Adaptive Boosting stands out as the highest performer, achieving an average testing accuracy of 93.70%, precision of 93.71%, recall of 93.70%, and F1 score of 93.69%, along with the highest AUC score of 0.9708, across all competing models considered in the study. These metrics indicate a superior ability to distinguish between patients with and without heart disease, effectively making it particularly valuable for clinical applications where early detection can save lives. The SHapley Additive exPlanations (SHAP) framework adopted to investigate the relative importance of the features in predicting heart disease revealed the most influential predictors (ST slope, chest pain type, old peak, and cholesterol), further aiding the understanding of heart disease mechanisms. Future work should explore the integration of ensemble learning algorithms with real‐time patient monitoring systems. This integration could allow for continuous health status updates, equipping predictive models with the information necessary to facilitate dynamic, real‐time interventions that are more closely aligned with patient needs.
- Conference Article
85
- 10.1109/itcosp.2017.8303115
- Mar 1, 2017
Heart disease is the number one problem for world. Heart disease more than people deaths occur during the first heart attack. But not only for heart attack have some problems attacked for breast cancer, lung cancer, ventricle. Valve, etc... It is essential to have a frame work that can effectually recognize the prevalence of heart disease in thousands of samples instaneously. In this paper the potential of nine (9) classification techniques was evaluated of prediction of heart disease. Namely decision tree, naive Bayesian neural network, SVM.ANN, KNN. My proposed algorithm of Apriori algorithm and SVM (support vector machine) in heart disease prediction. Using medical profiles such as a age, sex, blood pressure, chest pain type, fasting blood sugar. It can predict like of patients getting heart disease Based on this, medical society takes part interest in detecting and preventing the heart disease. From the analysis it have proved that classification based techniques contribute high effectiveness and obtain high accuracy compare than the previous methods.
- Conference Article
13
- 10.5220/0008381505080515
- Jan 1, 2019
Machine Learning (ML) is transforming the industries from delivering normal products to deliver intellect products. Large sets of data points are analysed by the computers and the relationship modelling is applied in a predictive way in real time to obtain accurate results. Machine Learning is adopted in healthcare problems for increasing efficiencies, saving money, and saving lives. The cost of medical treatment is reduced and the healthcare processes are optimized throughout the organization with the support of ML. ML improves healthcare delivery and patient health. Machine learning improves diagnosis and treatment options, also empowers individuals to take control of their health. Diagnosis advancements, predictive healthcare, medicines, and helping patients through ML interface produces better results. Heart Disease relates to many numbers of medical complications related to the heart. In recent years, ML has spread its knowledge in every field. In healthcare, the usage of ML has been significantly increased. This research work aims at the prediction of heart disease and classification of heart disease using Machine Learning algorithms. The experimental results are classified into five heart disease stages using values 0, 1, 2, 3, and 4, value 0 for no heart disease and 4 for severe heart disease. The Area Under the Curve (AUC) values depict the accuracy level of the prediction using this proposed model. The results are displayed using the data set in the form of charts that is easy to analyse the number of people having chest pains. The ML analytical report added up in the form of charts or other visuals, the results are reported informatively. This analysis is helpful for doctors and the medical industry for several case studies.
- Conference Article
47
- 10.1109/wecon.2016.7993480
- Oct 1, 2016
Today's health-care services have come a long way to provide medical care to the patients and protect them from various diseases. This paper comprises the development of a framework based on associative classification techniques on heart dataset for early diagnosis of heart based diseases. It is hard to diagnose the heart diseases with just observation that arrives suddenly and may prove fatal when it's uncontrolled. The implementation of work is done on Cleveland heart diseases dataset from the University of California Irvine (UCI) machine learning repository to test on different data mining techniques. The various attributes related to cause of heart diseases are viz: gender, age, chest pain type, blood pressure, blood sugar etc that can predict early symptoms heart disease. Various data mining algorithms such as Aprior, FP-Growth, Naive bayes, ZeroR, OneR, J48 and k-nearest neighbor are applied in this study for prediction of heart diseases. On basis of best results the development of heart disease prediction system is done by using hybrid technique for classification associative rules (CARs) to achieve the prediction accuracy of 99.19%.
- Book Chapter
2
- 10.1007/978-981-15-4409-5_41
- Oct 28, 2020
In modern society, mortality and morbidity are caused majorly by heart disease (HD), and in world, deaths are mainly caused by heart disease (HD). The detection of HD and prevention against death is a challenging task. Medical diagnosis is highly complicated and it is very important. It must be performed efficiently with high accuracy. The professionals in healthcare in heart disease diagnosis are assisted by using various techniques in data mining. In this work, heart disease prediction method with following steps is introduced. The steps are preprocessing technique, feature selection, and learning algorithm. Before that important features are selected via the use of the updated frequency-based bat algorithm (UFBBA). In the UFBBA algorithm, the frequency values are computed via the use of the features. If the features are most important, then the frequency is higher else the frequency is lower. A selected feature from the UFBBA is used for better accuracy results than the other classifiers. A feature selected from the algorithm is applied for classification (Vote). Experimentation dataset of the proposed system is collected from Irvine (UCI) Cleveland dataset, University of California dataset. The results are measured with respect to accuracy, f-measure, precision, and recall.
- Research Article
- 10.58346/jisis.2026.i1.028
- Feb 27, 2026
- Journal of Internet Services and Information Security
The detection and treatment of heart disorders caused by the impact of COVID-19 are improved by intensive work. Mining and recording the data in the medical field provides potential development in maintaining patients’ details. This is possible with the convergence of improved technology and medical diagnostics models. It is essential to analyze the interconnection of threat factors in the patient's clinical history to achieve the respective heart disease diagnosis. A meticulous analysis of multiple mechanisms in patient data is used to predict heart disease before COVID-19 infection. The main required attributes for detecting cardiac disorders due to COVID-19 are acquired by applying the feature selection model. The critical patient history details such as age, smoking habits, physical activity, stress levels, gender, previous chest pain occurrences, diabetes, electrocardiogram (ECG) readings, dietary patterns, chest pain type, and troponin levels are considered for predicting heart disease. Different AI technologies, such as deep neural networks with SVM (d-SVM), were used, and the results were compared between two datasets from the heart disease database. These methodologies are employed to select features from the database and are also employed on all features of the data repository. The enhanced accuracy rate of 95% is acquired through our proposed model, which uses selected features as input. Early heart disease prediction is achieved through our proposed technique's assisting structure. Successful deployment of an AI model as computationally intensive as this, from the laboratory setting to real-time actual clinical practice, however, relies on an architecture for scalable and fault-tolerant deployment. To this end, we introduce a Cloud Computing and Service Deployment paradigm that is specifically tailored for executing the d-SVM model. It leverages the cloud's elasticity as well as high availability to deal with fluctuating diagnosis loads in healthcare networks. By converting the d-SVM into an Internet-based Service through an Application Programming Interface (API), this work maximizes the potential of the model for large-scale, sustainable, and cost-saving clinical utility.
- Research Article
14
- 10.31185/wjps.125
- Mar 26, 2023
- Wasit Journal of Pure sciences
Heart Disease is a complex and life-threatening ailment that poses a significant mortality risk around the world, with nearly a third of global deaths attributable to heart-related conditions. The early prediction and detection of heart disease are of utmost importance in the medical field, as they may lead to saving numerous lives. However, the lack of heart expertise in many countries and the high rates of misdiagnosis highlight the need for accurate and efficient prediction methods. Machine learning-based approaches have the potential to address this need, particularly in handling the large amounts of data generated by medical sectors and hospitals. In this study, the performance and accuracy of several supervised machine learning algorithms were compared for heart disease prediction using a dataset obtained from PhysioNet databases. The classifiers that were applied included Artificial Neural Network (ANN), Gradient Boosting, Decision Tree, Naive Bayes, and Random Forest. Results showed that the ANN algorithm achieved the highest Accuracy of 94.1%, with a sensitivity and specificity of 94.1%. The study thus concluded that supervised machine learning techniques can be utilized with great success to forecast heart disease, displaying exceptional potential for practical application and accuracy