Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

The Dark Side of Love: Prediction of Digital Intimate Partner Violence and Associated Factors Among University Students Using Machine Learning.

  • TL;DR
  • Abstract
  • Literature Map
  • Similar Papers
TL;DR

This study used machine learning models, notably XGBoost with an AUC of 0.996, to predict digital intimate partner violence among university students, identifying key risk factors such as domestic violence history, urban residence, and frequent digital communication, informing targeted prevention strategies.

Abstract
Translate article icon Translate Article Star icon

This study aimed to identify key risk factors and predict digital intimate partner violence (DIPV) exposure and perpetration among university students using machine learning (ML) algorithms. A cross-sectional online survey was conducted with 1,764 university students (age range = 18-41 years, M = 20.8; 87.2% female, 12.8% male) selected through snowball sampling from a large public university in Türkiye. The survey included sociodemographic, lifestyle, and relationship variables, along with the Digital Intimate Partner Violence Scale. Six ML models were used: Logistic Regression (LR), XGBoost, Gradient Boosting (GB), Random Forest (RF), LightGBM, and Support Vector Machines (SVM). Model performance was evaluated using accuracy, precision, recall, F1 score, and receiver operating characteristics-area under the curve (ROC-AUC). XGBoost achieved the highest performance (AUC = 0.996), followed closely by RF and LightGBM (AUC = 0.995). LR and GB also performed well (AUC = 0.992), while SVM had slightly lower performance (AUC = 0.989). SHapley Additive exPlanations analysis revealed that domestic violence history, urban residence, father's low education, short relationship duration, and frequent digital communication were risk factors. High income perception and non-smoking reduced DIPV risk. ML models, particularly XGBoost, effectively predict DIPV. Socioeconomic and psychosocial factors should be targeted in prevention efforts, alongside digital literacy and support services.

Similar Papers
  • Research Article
  • 10.1136/bmjpo-2025-004000
Predicting stillbirth and identifying key maternal risk factors using machine learning
  • Jul 1, 2025
  • BMJ Paediatrics Open
  • Merga Abdissa Aga

BackgroundStillbirth remains a major public health concern, particularly in low-income and middle-income countries. Identifying maternal and obstetric determinants is essential for prevention and targeted interventions. Logistic regression offers a baseline predictive model, while machine learning (ML) methods, such as Random Forest (RF) and Extreme Gradient Boosting, can improve predictive accuracy and highlight key risk factors through feature importance. This study aimed to predict stillbirth and identify influential maternal and obstetric predictors among pregnant women in Ethiopia using ML models.MethodsA retrospective cross-sectional study was conducted using maternal and obstetric records from Bishoftu General Hospital, Ethiopia. Predictors included maternal age, weight, gravidity, gestational age at admission and delivery, history of pre-eclampsia, antenatal care visits, pregnancy complications, multiplicity, previous abortion and mode of delivery. Data were split into training (70%) and testing (30%) sets. RF, Gradient Boosting Machines, Support Vector Machines and logistic regression were applied. Model performance was evaluated using accuracy, precision, recall, balanced accuracy and receiver operating characteristic-area under the curve (ROC-AUC). Feature importance and SHapley Additive exPlanations (SHAP) supported interpretability.ResultsAmong 549 pregnancies, 17 stillbirths occurred. RF outperformed other models, achieving 92% accuracy, 0.95 ROC-AUC and 0.94 balanced accuracy. Maternal age was the strongest predictor, followed by mode of labour, maternal weight, gravidity and delivery mode. Pregnancy complications and antenatal care visits showed moderate importance, while history of pre-eclampsia, previous abortion and multiplicity contributed minimally. SHAP analysis confirmed these findings and explained variable-specific effects on risk.ConclusionsMaternal age emerged as the dominant determinant of stillbirth, with labour and delivery factors and maternal characteristics also contributing. ML models, particularly RF, effectively identified high-risk pregnancies and provided interpretable predictions through SHAP analysis. These findings underscore the potential of ML to support targeted prenatal care and reduce stillbirth risk in low-resource settings.

  • Research Article
  • Cite Count Icon 2
  • 10.3389/fpubh.2025.1659987
From mother to infant: predicting infant temperament using maternal mental health measures and tabular machine learning models
  • Sep 18, 2025
  • Frontiers in Public Health
  • Rawan Alsaad + 4 more

BackgroundNegative emotionality is a core dimension of infant temperament, characterized by heightened distress, reactivity, and difficulty with self-regulation. It has been consistently associated with later behavioral and emotional difficulties. Emerging evidence suggests that maternal mental health (MMH) in the postpartum period may influence infant temperament. However, few studies have applied machine learning (ML) methods to examine the predictive capacity of MMH profiles for early infant emotional development.ObjectivesThis study aimed to investigate whether postpartum maternal depression, anxiety, and birth-related trauma, along with sociodemographic factors, can predict infant negative emotionality during the first year postpartum using tabular ML models.MethodsData were obtained from 410 mother–infant dyads. Infant temperament was assessed using the Negative Emotionality subscale of the Infant Behavior Questionnaire-Revised (IBQ-R). MMH symptoms were measured via the Edinburgh Postnatal Depression Scale (EPDS), the Hospital Anxiety and Depression Scale (HADS), and the City Birth Trauma Scale (City BiTS). Six tabular ML models were trained using MMH and demographic features: Tabular Prior-Data Fitted Network (TabPFN), Light Gradient Boosting Machine (LightGBM), eXtreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), Random Forest, and Support Vector Machine (SVM). Performance was evaluated using Receiver Operating Characteristic Area Under The Curve (ROC-AUC), Precision-Recall Area Under the Curve (PR-AUC), F1-score, sensitivity, and specificity.ResultsPostpartum MMH symptoms and maternal–infant characteristics moderately predicted infant negative emotionality. LightGBM achieved the highest performance across ROC-AUC (0.76), F1-score (0.72), sensitivity (0.71), and specificity (0.73). TabPFN yielded the highest PR-AUC (0.78). Key predictors included gestational age, infant's age, EPDS score, mother's age, HADS score, and City BiTS score.ConclusionsThese findings highlight the potential of ML tools in early identification of at-risk infants and the importance of integrating MMH screening into postnatal care. Such predictive insights can inform timely, personalized interventions that address the unique emotional needs of both mother and infant, ultimately fostering healthier developmental trajectories and enhancing overall family well being.

  • Research Article
  • Cite Count Icon 2
  • 10.21037/tp-24-278
Constructing a predictive model for early-onset sepsis in neonatal intensive care unit newborns based on SHapley Additive exPlanations explainable machine learning.
  • Nov 1, 2024
  • Translational pediatrics
  • Xuefeng Tan + 8 more

The clinical characteristics of neonatal sepsis (NS) are subtle and non-specific, posing a serious threat to the lives of newborn infants. Early-onset sepsis (EOS) is sepsis that occurs within 72 hours after birth, with a high mortality rate. Identifying key factors of NS and conducting early diagnosis are of great practical significance. Thus, we developed a robust machine learning (ML) model for the early prediction of EOS in neonates admitted to the neonatal intensive care unit (NICU), investigated the pivotal risk factors associated with EOS development, and provided interpretable insights into the model's predictions. A retrospective cohort study was conducted. This includes 668 newborns (EOS and non-EOS) admitted to the NICU of Bozhou People's Hospital from January to December 2023, excluding 72 newborns born more than three days ago and 166 newborns with medical record data missing more than 30%. Finally, 430 newborns (EOS and non-EOS) were included in the study. Clinical case data were meticulously analyzed, and the dataset was randomly partitioned, allocating 75% for model training and the remaining 25% for test. Data preprocessing was meticulously performed using R language, and the least absolute shrinkage and selection operator (LASSO) regression was implemented to select salient features, mitigating the risk of overfitting. Six ML models were leveraged to forecast the incidence of EOS in neonates. The predictive performance of these models was rigorously evaluated using the receiver operating characteristic (ROC) curve and precision-recall (PR) curve. Furthermore, the SHapley Additive exPlanations (SHAP) framework was employed to provide intuitive explanations for the predictions made by the Categorical Boosting (CatBoost) model, which emerged as the top performer. The ROC area under the curve (ROCAUC) of six ML models, CatBoost, random forest (RF), eXtreme Gradient Boosting (XGBoost), multilayer perceptron (MLP), support vector machine (SVM), logistic regression (LR) all exceeded 0.900 on the test set. Especially the CatBoost model exhibited superior performance, with favorable outcomes in calibration, decision curve analysis (DCA), and learning curves. Notably, the ROCAUC attained 0.975, and the area under the PR curve (PRAUC) reached 0.947, signifying a high degree of predictive accuracy. Utilizing the SHAP method, seven key features were identified and ranked by their importance: respiratory rate (RR), procalcitonin (PCT), nasal congestion (NC), yellow staining (YS), white blood cell count (WBC), fever, and amniotic fluid turbidity (AFT). By constructing a precision-oriented ML model and harnessing the SHAP method for interpretability, this study effectively identified crucial risk factors for EOS development in neonates. This approach enables early prediction of EOS risk, thereby facilitating timely and targeted clinical interventions for precise diagnosis and treatment.

  • Research Article
  • 10.3390/a18080482
Machine Learning Subjective Opinions: An Application in Forensic Chemistry
  • Aug 4, 2025
  • Algorithms
  • Anuradha Akmeemana + 1 more

Simulated data created in silico using a previously reported method were sampled by bootstrapping to generate data sets for training multiple copies of an ensemble learner (i.e., a machine learning (ML) method). The posterior probabilities of class membership obtained by applying the ensemble of ML models to previously unseen validation data were fitted to a beta distribution. The shape parameters for the fitted distribution were used to calculate the subjective opinion of sample membership into one of two mutually exclusive classes. The subjective opinion consists of belief, disbelief and uncertainty masses. A subjective opinion for each validation sample allows identification of high-uncertainty predictions. The projected probabilities of the validation opinions were used to calculate log-likelihood ratio scores and generate receiver operating characteristic (ROC) curves from which an opinion-supported decision can be made. Three very different ML models, linear discriminant analysis (LDA), random forest (RF), and support vector machines (SVM) were applied to the two-state classification problem in the analysis of forensic fire debris samples. For each ML method, a set of 100 ML models was trained on data sets bootstrapped from 60,000 in silico samples. The impact of training data set size on opinion uncertainty and ROC area under the curve (AUC) were studied. The median uncertainty for the validation data was smallest for LDA ML and largest for the SVM ML. The median uncertainty continually decreased as the size of the training data set increased for all ML.The AUC for ROC curves based on projected probabilities was largest for the RF model and smallest for the LDA method. The ROC AUC was statistically unchanged for LDA at training data sets exceeding 200 samples; however, the AUC increased with increasing sample size for the RF and SVM methods. The SVM method, the slowest to train, was limited to a maximum of 20,000 training samples. All three ML methods showed increasing performance when the validation data was limited to higher ignitable liquid contributions. An ensemble of 100 RF ML models, each trained on 60,000 in silico samples, performed the best with a median uncertainty of 1.39x10−2 and ROC AUC of 0.849 for all validation samples.

  • Research Article
  • Cite Count Icon 4
  • 10.2196/70621
Optimizing Feature Selection and Machine Learning Algorithms for Early Detection of Prediabetes Risk: Comparative Study
  • Jul 31, 2025
  • JMIR Bioinformatics and Biotechnology
  • Mahmoud B Almadhoun + 1 more

BackgroundPrediabetes is an intermediate stage between normal glucose metabolism and diabetes and is associated with increased risk of complications like cardiovascular disease and kidney failure.ObjectiveIt is crucial to recognize individuals with prediabetes early in order to apply timely intervention strategies to decelerate or prohibit diabetes development. This study aims to compare the effectiveness of machine learning (ML) algorithms in predicting prediabetes and identifying its key clinical predictors.MethodsMultiple ML models are evaluated in this study, including random forest, extreme gradient boosting (XGBoost), support vector machine (SVM), and k-nearest neighbors (KNNs), on a dataset of 4743 individuals. For improved performance and interpretability, key clinical features were selected using LASSO (Least Absolute Shrinkage and Selection Operator) regression and principal component analysis (PCA). To optimize model accuracy and reduce overfitting, we used hyperparameter tuning with RandomizedSearchCV for XGBoost and random forest, and GridSearchCV for SVM and KNN. SHAP (Shapley Additive Explanations) was used to assess model-agnostic feature importance. To resolve data imbalance, SMOTE (Synthetic Minority Oversampling Technique) was applied to ensure reliable classifications.ResultsA cross-validated ROC-AUC (receiver operating characteristic area under the curve) score of 0.9117 highlighted the robustness of random forest in generalizing across datasets among the models tested. XGBoost followed closely, providing balanced accuracy in distinguishing between normal and prediabetic cases. While SVMs and KNNs performed adequately as baseline models, they exhibited limitations in sensitivity. The SHAP analysis indicated that BMI, age, high-density lipoprotein cholesterol, and low-density lipoprotein cholesterol emerged as the key predictors across models. The performance was significantly enhanced through hyperparameter tuning; for example, the ROC-AUC for SVM increased from 0.813 (default) to 0.863 (tuned). PCA kept 12 components while maintaining 95% of the variance in the dataset.ConclusionsIt is demonstrated in this research that optimized ML models, especially random forest and XGBoost, are effective tools for assessing early prediabetes risk. Combining SHAP analysis with LASSO and PCA enhances transparency, supporting their integration in real-time clinical decision support systems. Future directions include validating these models in diverse clinical settings and integrating additional biomarkers to improve prediction accuracy, offering a promising avenue for early intervention and personalized treatment strategies in preventive health care.

  • Research Article
  • 10.1200/op.2023.19.11_suppl.404
Prospective validation of machine learning-based approaches to predict potentially preventable emergency visits and hospitalizations.
  • Nov 1, 2023
  • JCO Oncology Practice
  • Isabel D Friesner + 11 more

404 Background: Patients undergoing cancer treatment are at risk for unplanned acute care. Early identification of at-risk patients could enable preventative interventions, reducing costs and treatment delays. To address this, the Centers for Medicare & Medicaid Services developed the Chemotherapy Measure (OP-35) to monitor potentially preventable acute care utilization during outpatient treatment. We previously developed machine-learning (ML) models using three approaches: least absolute shrinkage selection operator (LASSO), random forest (RF), and gradient boosted trees (GBT). The models predict risk of an OP-35 qualifying acute care event in the 30-days following a systemic therapy infusion and had good performance in the internal validation cohort, with GBT demonstrating the best predictive ability (receiver operating characteristic area under the curve (ROC-AUC) = 0.805). The aim of this study was to prospectively validate these models in patients undergoing systemic therapy at a single institution. Methods: All three models are being prospectively validated on systemic therapy infusions, including chemo, immuno, biologic, hormone, research and targeted therapies. Assuming a 2% event rate based on our prior data, with an alpha = 0.05 and 84% power, to detect an ROC-AUC of 0.75, validation will run for a total of 8000 infusions, from May 1 to August 21, 2023. We present early findings from May 1 to May 14, 2023 for this prospective validation. Model performance was assessed for calibration based on brier score and predictive ability based on ROC-AUC. Sensitivity and specificity were calculated for the model with the highest ROC-AUC based on a previously determined Youden’s J statistic. Results: This study included 1096 systemic therapy treatments across 957 patients. 21 (1.9%) infusions resulted in an OP-35 qualifying acute care event (3 emergency department visits and 18 hospitalizations). Most events were due to pain, anemia, sepsis, and/or dehydration. All models had good performance, with GBT demonstrating the greatest predictive ability (ROC-AUC = 0.78 [0.67 - 0.88], compared to LASSO (0.76 [0.65-0.85]) and RF (0.70 [0.56-0.81]). GBT also had good calibration (brier score = 0.017 [0.011 - 0.025]) followed closely by LASSO (0.018 [0.011-0.026]) and RF (0.019 [0.012-0.026]). Youden-based cut-off of 0.0249 corresponded to a validation sensitivity of 77.6% and specificity of 61.9%. Conclusions: Early prospective validation of ML models demonstrates accurate predictions of OP-35 qualifying acute care events on a per-infusion basis. The use of computational tools to identify patients at risk for unplanned acute care would enable preventative interventions, reducing costs and treatment disruptions. Prospective validation is ongoing and more comprehensive results will be presented at the meeting.

  • Research Article
  • Cite Count Icon 2
  • 10.3390/ph18060776
PLASMOpred: A Machine Learning-Based Web Application for Predicting Antimalarial Small Molecules Targeting the Apical Membrane Antigen 1–Rhoptry Neck Protein 2 Invasion Complex
  • May 23, 2025
  • Pharmaceuticals
  • Eugene Lamptey + 10 more

Objective: Falciparum malaria is a major global health concern, affecting more than half of the world’s population and causing over half a million deaths annually. Red cell invasion is a crucial step in the parasite’s life cycle, where the parasite invade human erythrocytes to sustain infection and ensure survival. Two parasite proteins, Apical Membrane Antigen 1 (AMA-1) and Rhoptry Neck Protein 2 (RON2), are involved in tight junction formation, which is an essential step in parasite invasion of the red blood cell. Targeting the AMA-1 and RON2 interaction with inhibitors halts the formation of the tight junction, thereby preventing parasite invasion, which is detrimental to parasite survival. This study leverages machine learning (ML) to predict potential small molecule inhibitors of the AMA-1–RON2 interaction, providing putative antimalaria compounds for further chemotherapeutic exploration. Method: Data was retrieved from the PubChem database (AID 720542), comprising 364,447 inhibitors and non-inhibitors of the AMA-1–RON2 interaction. The data was processed by computing Morgan fingerprints and divided into training and testing with an 80:20 ratio, and the classes in the training data were balanced using the Synthetic Minority Oversampling Technique. Five ML models developed comprised Random Forest (RF), Gradient Boost Machines (GBMs), CatBoost (CB), AdaBoost (AB) and Support Vector Machine (SVM). The performances of the models were evaluated using accuracy, F1 score, and receiver operating characteristic—area under the curve (ROC-AUC) and validated using held-out data and a y-randomization test. An applicability domain analysis was carried out using the Tanimoto distance with a threshold set at 0.04 to ascertain the sample space where the models predict with confidence. Results: The GBMs model emerged as the best, achieving 89% accuracy and a ROC-AUC of 92%. CB and RF had accuracies of 88% and 87%, and ROC-AUC scores of 93% and 91%, respectively. Conclusions: Experimentally validated inhibitors of the AMA-1–RON2 interaction could serve as starting blocks for the next-generation antimalarial drugs. The models were deployed as a web-based application, known as PLASMOpred.

  • Research Article
  • 10.1182/blood-2024-202244
Forecasting Hematological Activity in Antiphospholipid Syndrome Using Predictive Models
  • Nov 5, 2024
  • Blood
  • Amaya Llorente + 3 more

Forecasting Hematological Activity in Antiphospholipid Syndrome Using Predictive Models

  • Preprint Article
  • 10.5194/ems2025-562
Hydrological modelling using machine and deep learning models across multiple case studies
  • Jul 16, 2025
  • Majid Niazkar + 3 more

Machine learning (ML) and deep learning (DL) models can play an important role when it comes to modelling complicated processes. Such capability is necessary for hydrological and climate-related applications. Generally, ML models utilize precipitation and temperature time series of a basin as input to develop a lumped rainfall-runoff model to simulate streamflow at the basin outlet. However, when it is divided into several sub-basins, Graph Neural Networks (GNN) can consider each sub-basin as a node and link them together using a connectivity matrix to account for spatial variations of hydroclimatic variables. In this study, GNN and various ML models with different types of architecture, ranging from neural networks, tree-based structure, and gradient boosting, were exploited for daily streamflow simulation over different case studies. For each case study, the basin was divided into a few sub-basins for which daily precipitation and temperature data were aggregated and used as input. For training GNN, the connection matrix of sub-basins was also used as input. Basically, 75% of historical records were utilized to train GNN and different ML models, e.g., artificial neural networks, support vector machine, decision tree, random forest, eXtreme Gradient Boosting (XGBoost), Light Gradient-Boosting Machine (LightGBM), and Category Boosting (CatBoost), while the rest was used for testing. Streamflow simulation was conducted with/without considering seasonality impact and lag times. The obtained results clearly demonstrate that considering seasonality and time lags can enhance accuracy of streamflow predictions based on Kling–Gupta efficiency (KGE). Furthermore, GNN with seasonality impact and time lags achieved promising results across different case studies with KGE>0.85 for training and KGE>0.59 for testing data, respectively. Among ML models, boosting models, e.g., LightGBM and XGBoost, performed slightly better than other ML models. for Finally, this comparative analysis provides valuable insights for ML/DL applications in climate change impact assessments.Acknowledgements: This research work was carried out as part of the TRANSCEND project with funding received from the European Union Horizon Europe Research and Innovation Programme under Grant Agreement No. 10108411.

  • Research Article
  • Cite Count Icon 4
  • 10.1186/s41043-025-01095-8
Prevalence, associated factors, and machine learning-based prediction of depression, anxiety, and stress among university students: a cross-sectional study from Bangladesh
  • Oct 14, 2025
  • Journal of Health, Population, and Nutrition
  • Md Emran Hasan + 10 more

BackgroundMental health challenges are a growing global public health concern, with university students at elevated risk due to academic and social pressures. Although several studies have exmanined mental health among Bangladeshi students, few have integrated conventional statistical analyses with advanced machine learning (ML) approaches. This study aimed to assess the prevalence and factors associated with depression, anxiety, and stress among Bangladeshi university students, and to evaluate the predictive performance of multiple ML models for those outcomes.MethodsA cross-sectional survey was conducted in February 2024 among 1697 students residing in halls at two public universities in Bangladesh: Jahangirnagar University and Patuakhali Science and Technology University. Data on sociodemographic, health, and behavioral factors were collected via structured questionnaires. Mental health outcomes were measured using the validated Bangla version of the Depression, Anxiety, and Stress Scale-21 (DASS-21). Statistical analyses included chi-square tests and binary logistic regression, while seven ML models including, K-Nearest Neighbors (KNN), Random Forest (RF), Gradient Boosting Machine (GBM), Extreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), Logistic Regression (LR), and Support Vector Machine (SVM) were employed to predict mental health outcomes.ResultsThe prevalence of depression, anxiety, and stress was 56.9%, 69.5%, and 32.2%, respectively. Significant associated factors for depression included unfriendly family relationships, enrollment in commerce, and cigarette smoking. Female gender, unfriendly family relationships, academic year, and cigarette smoking were significant factors for stress. No significant factors were identified for anxiety. Among ML models, SVM achieved the highest accuracy for depression prediction (accuracy = 0.5693; precision = 0.7560; log loss = 0.6847), LR for anxiety (accuracy = 0.6948; precision = 0.7881), and CatBoost for stress (accuracy = 0.6706; precision = 0.6454; F1-score = 0.5777; log loss = 0.6284). Feature importance analyses highlighted faculty of study and relation with family as the top predictors. ROC-AUC values indicated moderate discriminatory performance (all ≥ 0.5).ConclusionsIntegrating machine learning with conventional analyses enhances the identification and prediction of factors associated with depression, anxiety, and stress among university students. These findings support the implementation of campus-based mental health screening, accessible counseling, and peer support programs, and highlight the value of data-driven approaches for developing targeted university mental health policies.

  • Research Article
  • 10.1200/jco.2025.43.5_suppl.647
Machine learning model integrating CT radiomics and circulating microRNAs to predict residual disease histology in metastatic non-seminoma testicular cancer (mNSTC).
  • Feb 10, 2025
  • Journal of Clinical Oncology
  • Guliz Ozgun + 14 more

647 Background: The primary treatment of most mNSTC is chemotherapy followed by surgery if the residual disease (RD) is >1 cm. However, conventional imaging lacks the specificity to characterize the tissue, often leading to overtreatment. This study hypothesizes that integrating CT-driven radiomics features with plasma miR371 and miR375 will enhance the predictive accuracy of Machine Learning (ML) models to predict teratoma, viable germ cell (vGCT) and fibrosis/necrosis (F/N) in mNSTC patients with RD. Methods: 111 lesions from52 patients, including residual teratoma (n=57), F/N (n=33), vGCT (n=10), and additional seminoma (n=11) for training purposes were included, split into training (N=78) and test cohorts (N=33). Lesions were lymph nodes (n=87), lung (n=21), and brain (n=3) with a median size of 1.6 cm (Q1-Q3 interval=1.2-2.73 cm). 3D Slicer version 5.6.1 was used to segment the RD > 1 cm (short axis) and extract radiomics features. Plasma miRNA levels before resection were measured by RT-PCR. Random Forest (RF), Support Vector Machine (SVM), Gradient Boosting (GB), and CatBoost (CB) ML models were evaluated to define the operating characteristics of radiomics alone (R-only) and in combination with miR371 (371) and/or miR375 (375) levels in predicting teratoma, vGCT and F/N. Results: For predicting teratoma, the best models were RF (R+375 and R+371+375), CB (R+371+375), and GB (R+371 and R+371+375). While adding miR371 or miR375 to R-only slightly improved AUC across models, the best results were achieved with the R+375+371 dataset. CB achieved AUCs ranging from 0.94 to 0.97 in training and 0.81 to 0.93 in test sets, with its highest AUC of 0.93 (95% CI: 0.78-0.97) on the R+375+371 dataset to differentiate all three classes. Similarly, GB demonstrated strong performance, achieving its highest AUC of 0.93 (95% CI: 0.79-0.96) on the R+375+371 dataset (Table). Conclusions: Integration of plasma miR371, miR375 and radiomics improved accuracy of predicting histologies across all ML models. These methods could be used to characterize the histology of RD in mNSTC patients to better inform treatment decisions. Further refinement, including incorporation of histological findings of the primary tumor, will be reported. AUC values of different ML algorithms on training and test sets. TRAINING SET TEST SET Model ±SD R R+375 R+371 R+375+371 Model (95% CI) R R+375 R+371 R+375+371 RF 0.93±0.05 0.95±0.04 0.95±0.03 0.96±0.04 RF 0.8(0.59-0.89) 0.85(0.72-0.93) 0.87(0.76-0.95) 0.91(0.78-0.95) SVM 0.84±0.06 0.84±0.09 0.89±0.11 0.89±0.09 SVM 0.72(0.54-0.80) 0.74(0.56-0.82) 0.83(0.69-0.92) 0.84(0.76-0.94) GB 0.94±0.04 0.91±0.08 0.95±0.05 0.97±0.03 GB 0.84(0.61-0.96) 0.89(0.77-0.97) 0.89(0.79-0.96) 0.93(0.79-0.96) CB 0.95±0.03 0.94±0.03 0.94±0.04 0.97±0.03 CB 0.81(0.6-0.93) 0.86(0.73-0.94) 0.89(0.78-0.97) 0.93(0.78-0.97)

  • Research Article
  • Cite Count Icon 5
  • 10.1002/ijgo.70264
Determining the risk of gestational diabetes using machine learning: A study on first-trimester PAPP-A and β-hCG data.
  • Jun 2, 2025
  • International journal of gynaecology and obstetrics: the official organ of the International Federation of Gynaecology and Obstetrics
  • Uğurcan Zorlu + 7 more

To evaluate the predictive potential of first-trimester biomarkers-pregnancy-associated plasma protein-A (PAPP-A) and free β-human chorionic gonadotropin (β-hCG)-combined with maternal body mass index (BMI), using machine learning (ML) algorithms for the early detection of gestational diabetes mellitus (GDM). A retrospective cohort study was conducted with 400 pregnant women who underwent first-trimester screening at Ankara Bilkent City Hospital. Demographic, clinical, and biochemical data, including PAPP-A, free β-hCG, and BMI, were collected. ML models, including random forest, gradient boosting, and logistic regression, were employed to predict GDM risk. Data standardization, model training, and performance evaluation were performed using metrics such as accuracy, F1 score, and receiver operating characteristics area under the curve (ROC-AUC). The combination of PAPP-A, free β-hCG, and BMI significantly enhanced GDM prediction accuracy across all ML models. Gradient boosting achieved the highest performance with an ROC-AUC of 0.715 and an accuracy of 71.3%, demonstrating the robust predictive value of these variables. PAPP-A alone showed limited predictive capacity (ROC-AUC: 0.632), but its integration with BMI improved model performance substantially. Cut-off analyses identified key thresholds for PAPP-A (<1.02) and BMI (>26.12) for effective risk stratification. This study underscores the potential of integrating first-trimester biomarkers with ML algorithms for early GDM prediction. By using routinely collected clinical data, this approach offers a cost-effective and scalable solution for improving maternal and neonatal health outcomes. Future research should validate these findings in diverse populations and explore the incorporation of additional biomarkers to further refine predictive models.

  • Research Article
  • Cite Count Icon 1
  • 10.4108/ew.7114
Comparison of Machine Learning and Deep Learning Models Performance in predicting wind energy
  • Jul 21, 2025
  • EAI Endorsed Transactions on Energy Web
  • Saswati Rakshit + 1 more

The prediction of wind energy generation is important to enhance the performance and dependability of renewable energy systems due to the rising demand for wind-generated electricity and advancements in wind energy technology competitiveness. This study leverages advanced machine learning (ML) and some other statistical and deep learning based time series forecasting models to enhance the accuracy of wind energy predictions. This comprehensive analysis includes nine ML models—Linear Regression, Random Forests (RF), Gradient Boosting Machines (GBM), Support Vector Machines (SVM), K-Nearest Neighbors (KNN), AdaBoost, XGBoost, Support Vector Regression (SVR), and Neural Networks—as well as Four time-series forecasting models—ARIMA, Temporal Convolutional Networks (TCNs), Long Short-Term Memory (LSTM) networks and GRU. Each ML model underwent rigorous cross-validation to ensure optimal performance. The assessment criteria utilized here comprised the Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and the R² Score. It was found that among the nine ML models, Random Forests, GBM and KNN consistently provided superior accuracy and robustness, making them the top choices for wind energy prediction whereas the performance of linear regression, SVM and SVR were very poor for the considered dataset. From the experiment, Random Forest, GBM, and KNN showed the best performance with low MSE values of 0.77, 1.95, and 1.51 respectively, while other models had MSEs above 7.5, with AdaBoost reaching 30. Their RMSEs (0.88, 1.40, 1.23) and MAEs (0.093, 0.73, 0.10) also indicate strong predictive accuracy compared to the rest.In this paper, time series forecasting, TCNs, LSTM and GRU networks showed strong capabilities in capturing temporal dependencies and trends within the wind energy data. Visualization techniques were employed to compare model performances comprehensively, providing clear insights into their predictive power. Therefore, this present study offers a robust framework for researchers and practitioners aiming to leverage machine learning and time series forecasting in the realm of renewable energy prediction.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 40
  • 10.3390/ijgi9040276
Comparing Machine Learning Models and Hybrid Geostatistical Methods Using Environmental and Soil Covariates for Soil pH Prediction
  • Apr 23, 2020
  • ISPRS International Journal of Geo-Information
  • Panagiotis Tziachris + 4 more

In the current paper we assess different machine learning (ML) models and hybrid geostatistical methods in the prediction of soil pH using digital elevation model derivates (environmental covariates) and co-located soil parameters (soil covariates). The study was located in the area of Grevena, Greece, where 266 disturbed soil samples were collected from randomly selected locations and analyzed in the laboratory of the Soil and Water Resources Institute. The different models that were assessed were random forests (RF), random forests kriging (RFK), gradient boosting (GB), gradient boosting kriging (GBK), neural networks (NN), and neural networks kriging (NNK) and finally, multiple linear regression (MLR), ordinary kriging (OK), and regression kriging (RK) that although they are not ML models, they were used for comparison reasons. Both the GB and RF models presented the best results in the study, with NN a close second. The introduction of OK to the ML models’ residuals did not have a major impact. Classical geostatistical or hybrid geostatistical methods without ML (OK, MLR, and RK) exhibited worse prediction accuracy compared to the models that included ML. Furthermore, different implementations (methods and packages) of the same ML models were also assessed. Regarding RF and GB, the different implementations that were applied (ranger-ranger, randomForest-rf, xgboost-xgbTree, xgboost-xgbDART) led to similar results, whereas in NN, the differences between the implementations used (nnet-nnet and nnet-avNNet) were more distinct. Finally, ML models tuned through a random search optimization method were compared with the same ML models with their default values. The results showed that the predictions were improved by the optimization process only where the ML algorithms demanded a large number of hyperparameters that needed tuning and there was a significant difference between the default values and the optimized ones, like in the case of GB and NN, but not in RF. In general, the current study concluded that although RF and GB presented approximately the same prediction accuracy, RF had more consistent results, regardless of different packages, different hyperparameter selection methods, or even the inclusion of OK in the ML models’ residuals.

  • Research Article
  • 10.1155/ppc/8308389
Mental and Behavioral Factors Associated With Food Addiction Among University Students: A Bangladeshi Study
  • Jan 1, 2025
  • Perspectives in Psychiatric Care
  • Pronab Das + 8 more

BackgroundFood addiction, characterized by the compulsive consumption of highly palatable foods, poses significant health risks, particularly among university students. This study investigates the prevalence of food addiction among Bangladeshi university students and its associations with mental health (depression, anxiety, stress, and insomnia) and behavioral factors (smoking, drug, alcohol use, and pornography consumption). Machine learning (ML) models were applied to enhance predictive accuracy.MethodsA cross‐sectional survey was conducted among 1697 participants across two Bangladeshi universities. Food addiction was assessed using the Modified Yale Food Addiction Scale 2.0 (mYFAS 2.0). Associations were examined using logistic regression and subgroup analyses by gender. Six ML models—K‐nearest neighbors (KNN), support vector machine (SVM), random forest (RF), gradient boosting machine (GBM), XGBoost, and CatBoost—were employed to improve classification performance.ResultsOverall, 13% of students met the criteria for food addiction, with higher prevalence among males (14.8%) than females (10.4%). In adjusted models, anxiety (AOR = 2.44, 95% CI: 1.43–4.16), stress (AOR = 1.74, 95% CI: 1.18–2.58), and pornography use (AOR = 1.74, 95% CI: 1.12–2.69) were significant predictors. Subgroup analyses showed that anxiety, stress, and pornography use were significant predictors only among males. Among ML models, KNN achieved the highest accuracy (85.3%), while RF demonstrated the best AUC‐ROC (0.697), confirming their utility in identifying at‐risk individuals.ConclusionsFood addiction affects a notable proportion of Bangladeshi university students and is strongly linked with anxiety, stress, and pornography use, particularly among males. Interventions should include cognitive‐behavioral therapy and stress management programs, digital hygiene education, and nutritional counseling tailored to student populations. ML‐based predictive models, such as RF and CatBoost, may be integrated into campus health systems to support early identification and personalized interventions.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant