An ensemble machine learning model for the prediction of danger zones: Towards a global counter-terrorism
Terrorism can be described as the use of violence against persons or properties to intimidate or coerce a government or its citizens to some certain political or social objectives. It is a global problem which has led to loss of lives and properties and known to have negative impacts on tourism and global economy. Terrorism has also been associated with high level of insecurity and most nations of the world are interested in any research efforts that can reduce its menace. Most of the research efforts on terrorism have focused on measures to fight terrorism or how to reduce the activities of terrorists but there are limited efforts on terrorism prediction. The aim of this work is to develop an ensemble machine learning model which combines Support Vector Machine and K-Nearest Neighbor for prediction of continents susceptible to terrorism. Data was obtained from Global Terrorism Database and data preprocessing included data cleaning and dimensionality reduction. Two feature selection techniques, Chi-squared, Information Gain and a hybrid of both were applied to the dataset before modeling. Ensemble machine learning models were then constructed and applied on the selected features. Chi-squared, Information Gain and the hybrid-based features produced an accuracy of 94.17%, 97.34% and 97.81% respectively at predicting danger zones with respective sensitivity scores of 82.3%, 88.7% and 92.2% and specificity scores of 98%, 90.5% and 99.67% respectively. These imply that the hybrid-based selected features produced the best results among the feature selection techniques at predicting terrorism locations. Our results show that ensemble machine learning model can accurately predict terrorism locations.
- Research Article
8
- 10.1177/20552076231173225
- Jan 1, 2023
- Digital health
Electronic health records provide the opportunity to identify undiagnosed individuals likely to have a given disease using machine learning techniques, and who could then benefit from more medical screening and case finding, reducing the number needed to screen with convenience and healthcare cost savings. Ensemble machine learning models combining multiple prediction estimates into one are often said to provide better predictive performances than non-ensemble models. Yet, to our knowledge, no literature review summarises the use and performances of different types of ensemble machine learning models in the context of medical pre-screening. We aimed to conduct a scoping review of the literature reporting the derivation of ensemble machine learning models for screening of electronic health records. We searched EMBASE and MEDLINE databases across all years applying a formal search strategy using terms related to medical screening, electronic health records and machine learning. Data were collected, analysed, and reported in accordance with the PRISMA scoping review guideline. A total of 3355 articles were retrieved, of which 145 articles met our inclusion criteria and were included in this study. Ensemble machine learning models were increasingly employed across several medical specialties and often outperformed non-ensemble approaches. Ensemble machine learning models with complex combination strategies and heterogeneous classifiers often outperformed other types of ensemble machine learning models but were also less used. Ensemble machine learning models methodologies, processing steps and data sources were often not clearly described. Our work highlights the importance of deriving and comparing the performances of different types of ensemble machine learning models when screening electronic health records and underscores the need for more comprehensive reporting of machine learning methodologies employed in clinical research.
- Research Article
- 10.20428/jst.v30i6.2864
- May 30, 2025
- Journal of Science and Technology
Depression is a mental illness that can make a person’s life difficult and can eventually lead to suicide. Depressed individuals who do not receive timely attention develop worse conditions and may eventually commit suicide. Depression and suicide are becoming a global health concern which need to be adequately addressed. In this study, an ensemble learning model which make use of demographic data to detect depression and suicide attempt and also guide individuals from committing suicide through the web-based application system is proposed. The forever Alone demographic dataset which was downloaded from Kaggle online data repository was used, the dataset was imbalanced and was balanced using synthetic minority oversampling technique (SMOTE). The dataset was split into 60/40, 70/30 and 80/20 train/test percentage split, however, the 80/20 train/test split performed best and it was used and reported in this study. The study employs an ensemble machine learning model, specifically Adaboost with Extra trees as base estimators for prediction. Adaboost enhances model performance especially in handling class imbalance leading to excellent accuracy. Results obtained reveal that Adaboost ensemble model outperformed all other machine learning algorithms across all evaluation metrics with 82.00% recall and 78.69% accuracy for depression, and 93.85% recall and 90.60% accuracy for suicide attempt respectively on the balanced dataset. The uniqueness of Adaboost in sequential weighting of misclassified instances which enhances model performance, especially in handling class imbalance thus leading to an excellent accuracy. It was therefore used for the prediction system. The study affirmed the prowess of ensemble machine learning model for predicting depression and suicide attempt. Ethical issues were also discussed in the study.
- Research Article
4
- 10.1002/pro.5007
- May 9, 2024
- Protein Science
The identification of an effective inhibitor is an important starting step in drug development. Unfortunately, many issues such as the characterization of protein binding sites, the screening library, materials for assays, etc., make drug screening a difficult proposition. As the size of screening libraries increases, more resources will be inefficiently consumed. Thus, new strategies are needed to preprocess and focus a screening library towards a targeted protein. Herein, we report an ensemble machine learning (ML) model to generate a CDK8-focused screening library. The ensemble model consists of six different algorithms optimized for CDK8 inhibitor classification. The models were trained using a CDK8-specific fragment library along with molecules containing CDK8 activity. The optimized ensemble model processed a commercial library containing 1.6 million molecules. This resulted in a CDK8-focused screening library containing 1,672 molecules, a reduction of more than 99.90%. The CDK8-focused library was then subjected to molecular docking, and 25 candidate compounds were selected. Enzymatic assays confirmed six CDK8 inhibitors, with one compound producing an IC50 value of ≤100 nM. Analysis of the ensemble ML model reveals the role of the CDK8 fragment library during training. Structural analysis of molecules reveals the hit compounds to be structurally novel CDK8 inhibitors. Together, the results highlight a pipeline for curating a focused library for a specific protein target, such as CDK8.
- Research Article
1
- 10.1186/s12933-025-02911-5
- Sep 30, 2025
- Cardiovascular diabetology
Early mortality prediction in critically ill patients with cardiovascular disease remains challenging. This study aimed to develop and validate an ensemble machine learning (ML) model to predict 30-day mortality, comparing its performance with conventional severity scores and interrogating the incremental prognostic value of stress hyperglycemia ratio (SHR). A retrospective cohort of 1,595 ICU patients with cardiovascular disease combined with diabetes (2008-2022) was analyzed. SHR was calculated as admission glucose divided by estimated average glucose (eAG) from HbA1c. Six ML models (eXtreme Gradient Boosting [XGBoost], Decision Tree [DT], Random Forest [RF], Artificial Neural Network [ANN], Logistic Regression [LR], and Support Vector Machine [SVM]) were trained on 80% of the data, with the top three performers combined into an ensemble model. Model performance was evaluated using area under the curve (AUC), precision-recall, calibration, and clinical utility metrics. The 30-day mortality rate was 10.8% in the entire cohort (n = 173). The ensemble model demonstrated superior predictive performance with an AUC of 0.912 (95% CI: 0.888-0.936), outperforming both individual ML models (XGBoost, AUC = 0.903) and traditional scoring systems (APS III/SOFA/SAPS II AUCs ≤ 0.742; all P < 0.001). The top six important predictors included anti-hypertensives, aspirin, blood urea nitrogen (BUN), white blood cell (WBC), age, and red blood cell (RBC), with the Shapley Additive Explanations analysis revealing clinically meaningful patterns: a nonlinear risk escalation for age, linear risk increases with rising BUN and bilirubin levels, a protective effect associated with higher RBC counts, and both low and high WBC levels linked to increased early death risk. While SHR significantly improved the performance of traditional scoring systems (e.g., increasing SOFA AUC from 0.741 to 0.757, P = 0.010), its addition to the ensemble model provided limited incremental benefit (ΔAUC = - 0.032, P = 0.094). External validation in an independent cohort (n = 307) confirmed the model's robustness (AUC = 0.891, 95% CI: 0.864-0.917), with decision curve analysis demonstrating superior clinical utility across a wide range of risk thresholds. The ensemble ML model outperformed conventional prognostic tools in predicting 30-day mortality, with SHR augmenting traditional tools but not the ensemble ML model. This approach offers a reliable, interpretable framework for risk stratification in high-risk cardiovascular patients.
- Research Article
30
- 10.1016/j.envres.2023.116131
- May 18, 2023
- Environmental Research
Digital mapping of soil organic carbon density in China using an ensemble model
- Research Article
189
- 10.1016/j.conbuildmat.2020.118271
- Feb 17, 2020
- Construction and Building Materials
An ensemble machine learning approach for prediction and optimization of modulus of elasticity of recycled aggregate concrete
- Research Article
195
- 10.1016/j.cemconcomp.2021.104295
- Oct 13, 2021
- Cement and Concrete Composites
This study aims to provide an efficient and accurate machine learning (ML) approach for predicting the creep behavior of concrete. Three ensemble machine learning (EML) models are selected in this study: Random Forest (RF), Extreme Gradient Boosting Machine (XGBoost) and Light Gradient Boosting Machine (LGBM). Firstly, the creep data in Northwestern University (NU) database is preprocessed by a prebuilt XGBoost model and then split into a training set and a testing set. Then, by Bayesian Optimization and 5-fold cross validation, the 3 EML models are tuned to achieve high accuracy (R2 = 0.953, 0.947 and 0.946 for LGBM, XGBoost and RF, respectively). In the testing set, the EML models show significantly higher accuracy than the equation proposed by the fib Model Code 2010 (R2 = 0.377). Finally, the SHapley Additive exPlanations (SHAP), based on the cooperative game theories, are calculated to interpretate the predictions of the EML model. Five most influential parameters for concrete creep compliance are identified by the SHAP values of EML models as follows: time since loading, compressive strength, age when loads are applied, relative humidity during the test and temperature during the test. The patterns captured by the three EML models are consistent with theoretical understanding of factors that influence concrete creep, which proves that the proposed EML models show reasonable predictions.
- Research Article
- 10.3390/medicina61111945
- Oct 30, 2025
- Medicina
Background and Objectives: We aimed to apply the ensemble machine learning model to diagnose thyroid cartilage invasion detected in computer tomography (CT) images in laryngeal cancers and evaluate the diagnostic performance of the model. Materials and Methods: A total of 313 patients were divided into two groups: the cartilage invasion group and the no cartilage invasion group. At least four CT slices were randomly selected for each patient, resulting in a total of 1251 images used in the study. A total of 619 axial CT images from the no cartilage invasion group and 632 axial CT images from the cartilage invasion group were used in the study. We reviewed the CT images and histopathological diagnoses in all cases to determine the invasion positive- or negative-status as a ground truth. The ensemble model, comprising ResNet50 and MobileNet deep learning architectures, was applied to CT images. Results: The following were obtained by the ensemble model with the test dataset: area under the curve (AUC) 0.99, and accuracy 96.54%. This model demonstrates a very high level of performance in detecting thyroid cartilage invasion. Conclusions: The ensemble machine learning model is an effective method for detecting neoplastic infiltration of the thyroid cartilage. Moreover, it may be a valuable diagnostic tool for clinicians in assessing disease prognosis and determining appropriate treatment strategies in laryngeal cancers. In conclusion, this model could be integrated into future clinical practice in laryngology and head and neck surgery for the detection of cartilage neoplastic infiltration.
- Research Article
136
- 10.1016/j.epsr.2020.106904
- Oct 31, 2020
- Electric Power Systems Research
Ensemble machine learning models for the detection of energy theft
- Research Article
115
- 10.1016/j.geodrs.2020.e00256
- Feb 5, 2020
- Geoderma Regional
Digital mapping of soil organic carbon using ensemble learning model in Mollisols of Hyrcanian forests, northern Iran
- Research Article
5
- 10.1016/j.matpr.2024.04.081
- Apr 1, 2024
- Materials Today: Proceedings
Comparative analysis of conventional and ensemble machine learning models for predicting split tensile strength in thermal stressed SCM-blended lightweight concrete
- Research Article
2
- 10.1186/s40537-024-00966-x
- Jul 24, 2024
- Journal of Big Data
ObjectiveThis study was designed to develop and validate a robust predictive model for one-year mortality in elderly coronary heart disease (CHD) patients with anemia using machine learning methods.MethodsDemographics, tests, comorbidities, and drugs were collected for a cohort of 974 elderly patients with CHD. A prospective analysis was performed to evaluate predictive performances of the developed models. External validation of models was performed in a series of 112 elderly CHD patients with anemia.ResultsThe overall one-year mortality was 43.6%. Risk factors included heart rate, chronic heart failure, tachycardia and β receptor blockers. Protective factors included hemoglobin, albumin, high density lipoprotein cholesterol, estimated glomerular filtration rate (eGFR), left ventricular ejection fraction (LVEF), aspirin, clopidogrel, calcium channel blockers, angiotensin converting enzyme inhibitors (ACEIs)/angiotensin receptor blockers (ARBs), and statins. Compared with other algorithms, an ensemble machine learning model performed the best with area under the curve (95% confidence interval) being 0.828 (0.805–0.870) and Brier score being 0.170. Calibration and density curves further confirmed favorable predicted probability and discriminative ability of an ensemble machine learning model. External validation of Ensemble Model also exhibited good performance with area under the curve (95% confidence interval) being 0.825 (0.734–0.916) and Brier score being 0.185. Patients in the high-risk group had more than six-fold probability of one-year mortality compared with those in the low-risk group (P < 0.001). Shaley Additive exPlanation identified the top five risk factors that associated with one-year mortality were hemoglobin, albumin, eGFR, LVEF, and ACEIs/ARBs.ConclusionsThis model identifies key risk factors and protective factors, providing valuable insights for improving risk assessment, informing clinical decision-making and performing targeted interventions. It outperforms other algorithms with predictive performance and provides significant opportunities for personalized risk mitigation strategies, with clinical implications for improving patient care.
- Research Article
- 10.3760/cma.j.cn112139-20240411-00180
- Oct 1, 2024
- Zhonghua wai ke za zhi [Chinese journal of surgery]
Objective: To construct an ensemble machine learning model for predicting the occurrence of clinically relevant postoperative pancreatic fistula (CR-POPF) after pancreaticoduodenectomy and evaluate its application value. Methods: This is a research on predictive model. Clinical data of 421 patients undergoing pancreaticoduodenectomy in the Department of Pancreatic Surgery,Union Hospital, Tongji Medical College,Huazhong University of Science and Technology from June 2020 to May 2023 were retrospectively collected. There were 241 males (57.2%) and 180 females (42.8%) with an age of (59.7±11.0)years (range: 12 to 85 years).The research objects were divided into training set (315 cases) and test set (106 cases) by stratified random sampling in the ratio of 3∶1. Recursive feature elimination is used to screen features,nine machine learning algorithms are used to model,three groups of models with better fitting ability are selected,and the ensemble model was constructed by Stacking algorithm for model fusion. The model performance was evaluated by various indexes,and the interpretability of the optimal model was analyzed by Shapley Additive Explanations(SHAP) method. The patients in the test set were divided into different risk groups according to the prediction probability (P) of the alternative pancreatic fistula risk score system (a-FRS). The a-FRS score was validated and the predictive efficacy of the model was compared. Results: Among 421 patients,CR-POPF occurred in 84 cases (20.0%). In the test set,the Stacking ensemble model performs best,with the area under the curve (AUC) of the subject's work characteristic curve being 0.823,the accuracy being 0.83,the F1 score being 0.63,and the Brier score being 0.097. SHAP summary map showed that the top 9 factors affecting CR-POPF after pancreaticoduodenectomy were pancreatic duct diameter,CT value ratio,postoperative serum amylase,IL-6,body mass index,operative time,albumin difference before and after surgery,procalcitonin and IL-10. The effects of each feature on the occurrence of CR-POPF after pancreaticoduodenectomy showed a complex nonlinear relationship. The risk of CR-POPF increased when pancreatic duct diameter<3.5 mm,CT value ratio<0.95,postoperative serum amylase concentration>150 U/L,IL-6 level>280 ng/L,operative time>350 minutes,and albumin decreased by more than 10 g/L. The AUC of a-FRS in the test set was 0.668,and the prediction performance of a-FRS was lower than that of the Stacking ensemble machine learning model. Conclusion: The ensemble machine learning model constructed in this study can predict the occurrence of CR-POPF after pancreaticoduodenectomy,and has the potential to be a tool for personalized diagnosis and treatment after pancreaticoduodenectomy.
- Research Article
- 10.38094/jastt62264
- Aug 8, 2025
- Journal of Applied Science and Technology Trends
Child mortality is a big problem around the world, especially in low- and middle-income nations where there are big differences in health care and social conditions. This investigation seeks to create a predictive model for child mortality and pinpoint the key factors that significantly contribute to it, employing machine learning (ML) methodologies. The dataset includes various features such as parental age, maternal education, birth weight, wealth index, and access to healthcare services. Thirteen machine learning classifiers were used, categorized into four model groups: Traditional Models (Logistic Regression, K-Nearest Neighbors, Support Vector Machine, Naive Bayes), Tree-Based Models (Decision Tree, Random Forest, Extra Trees), Boosting Models (AdaBoost, Gradient Boosting, XGBoost), and Ensemble Learning Models (Soft Voting, Hard Voting, Stacking). The efficacy of each model was assessed using classification metrics, including Accuracy, Precision, Recall, and F1-Score within a 10-fold cross-validation framework to guarantee robustness. Results indicate that ensemble models, particularly AdaBoost, achieved the highest predictive accuracy, with perfect scores across all metrics (1.00). XGBoost and Stacking also demonstrated strong and consistent performance. The findings indicate that ensemble learning methods are effective in predicting child mortality and can assist policymakers and healthcare planners in identifying high-risk populations and implementing targeted interventions to reduce child mortality.
- Research Article
- 10.3389/fpls.2025.1576212
- Apr 15, 2025
- Frontiers in plant science
The rapid and non-destructive estimation of rice aboveground biomass (AGB) is vital for accurate growth assessment and yield prediction. However, vegetation indices (VIs) often suffer from saturation due to high canopy coverage and vertical organs, limiting their accuracy across multiple growth stages. Therefore, this study utilizes UAV-acquired RGB and multi-spectral (MS) images during several critical rice stages to explore the potential of multi-source data fusion for accurately and cost-effectively estimating rice AGB. High-frequency texture features were extracted from RGB images using discrete wavelet transform (DWT), while low-order color moments in RGB and Lab color spaces were calculated. VIs were derived from MS images. Feature selection combined statistical analysis and modeling techniques, with collinearity removed through the Variance Inflation Factor (VIF). The relationships between AGB and the selected features were then analyzed using multiple fitting functions. Both single-type and multi-type features were used to develop individual and ensemble machine learning (ML) models for rice AGB estimation. The findings indicate that: (i) Single-type features result in significant errors and low accuracy within the same sensor, but multi-feature fusion improves performance. (ii) Fusing RGB and MS image features enhances AGB estimation accuracy over single-sensor features. (iii) Ensemble ML models outperform individual models, providing higher accuracy and stability, with the best model achieving an R2 of 0.8564 and RMSE of 169.32 g/m2. This study demonstrates that multi-source UAV image feature fusion with ensemble learning effectively leverages complementary data strengths, offering an efficient solution for monitoring rice AGB across growth stages.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.