MODELLING POVERTY STATUS IN ANAMBRA STATE: A COMPARATIVE ANALYSIS OF MACHINE LEARNING CLASSIFIERS

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Poverty remains a pressing socio-economic issue in Anambra State, Nigeria, necessitating data-driven strategies for accurate assessment and policy action. This study applies machine learning techniques to model poverty status using socio-economic variables, including age, satisfaction level, perception of poverty trends over the past eight years, choice of health facility, source of fuel, and educational attainment. The analysis utilizes secondary data from the Anambra Bureau of Statistics Poverty Index Survey 2021, comprising approximately 2,500 households across 188 communities. Three classification algorithms: Random Forest (RF), Support Vector Machines (SVM), and Gradient Boosting (GB) were employed to estimate poverty status and compared using key performance metrics: accuracy, precision, recall, F1-score, Area Under the Curve (AUC), Mean Squared Error (MSE), and R-squared. The study’s objectives were to: (1) identify key socio-economic determinants of poverty, (2) apply RF, SVM, and GB models to classify poverty status, and (3) determine the most effective classifier based on predictive performance. Empirical results showed that the Gradient Boosting model had the highest classification accuracy (92.3%), followed by RF (89.7%) and SVM (85.4%). F1-scores ranged from 0.81 to 0.91, with GB outperforming others due to its superior handling of complex, non-linear data patterns. Feature importance analysis revealed that perception of poverty rate and choice of health facility were the most influential predictors, followed by educational qualification and fuel source. These findings demonstrate the value of machine learning in socio-economic research and advocate for its integration into real-time poverty monitoring and targeted policy interventions in Anambra State.

Similar Papers
  • Research Article
  • 10.11576/seejph-4430
Factors influencing the choice of facilities among enrolees of a prepayment scheme in Ibadan, Southwest Nigeria
  • May 11, 2021
  • David Ayobami Adewole + 1 more

Aims: Factors that influence the personal choice of a health care facility among health care consumers vary. Currently, what influences the choice of health facilities among enrollees under the National Health Insurance Scheme (NHIS) is not known. This study aimed to as-sess what influences the choice of facilities in the NHIS of Nigeria. Methods: This was a descriptive cross-sectional study conducted among enrollees in selected NHIS facilities in the 11 Local Government Areas (LGAs) of Ibadan, Nigeria. A total of 432 enrollees were selected and were interviewed. A WHO-USAID semi-structured interviewer-administered questionnaire was used to obtain relevant data. Data collection was between Oc-tober and December 2019. Data were analyzed using STATA version 12.0 (α =0.05).Results: At unadjusted OR, older respondents (OR 3.24, CI = 2.52-4.18, p = <0.0001), and those who had attained the tertiary level of education (OR 3.30, CI 2.57-4.23, p <0.0001) were more likely to make a personal choice of health care facilities. A similar pattern was ob-served among respondents who were in the high socioeconomic group (OR 4.10, CI 3.01-5.59, p = <0.0001). However, at Adjusted OR, only high socio-economic status was a predic-tor of personal choice of health care facility (OR 1.92, CI 1.21-3.05, p = 0.005). Conclusion: This study is suggestive that a need for and the ability to afford the cost of care influence the choice of health facilities. Policies that promote health literacy in the general populace will enhance the capability of individuals to make a personal choice of health facili-ties. Stakeholders should prioritize this for policy.Recommended citation: David A. Adewole, Temitope Ilori. Factors influencing the choice of facilities among enrolees of a prepayment scheme in Ibadan, Southwest Nigeria Acknowledgments: The authors wish to acknowledge study participants for permission to interview them in the course of the data collection of this study.Authors' contributions: David Adewole conceived and designed the study. Temitope Ilori did data collection and analysis. Both authors contributed equally to the manuscript write-up. The two authors also read through the manuscript draft the second time and agreed to the final manuscript. Conflict of interests: None declared.

  • Research Article
  • Cite Count Icon 2
  • 10.1186/s12891-025-08710-z
Machine learning–based survival models for predicting rehospitalization of older hip fracture patients: a retrospective cohort study
  • May 8, 2025
  • BMC Musculoskeletal Disorders
  • Juhan Oh + 4 more

PurposeTo evaluate machine learning–based survival model roles in predicting rehospitalization after hip fractures to improve reduce the burden on the healthcare system.MethodsThis retrospective cohort study examined 718 patients with hip fractures hospitalized at the Daejeon Eulji Medical Center between January 2020 and June 2022. Demographic and clinical variables, and rehospitalization data were collected at 6 weeks and 3, 6, 12, and 24 months. Cox proportional hazards (CoxPH), random survival forest (RSF), gradient boosting (GB), and fast survival support vector machine (SVM) models were developed.Model performance was assessed using the concordance index (c-index), area under the curve (AUC), and Kaplan–Meier survival curves. Feature importance was analyzed using permutation importance, with the best model selected based on overall performance.ResultsHyperparameter tuning optimized the models. The GB model had the highest mean AUC of 0.868, followed by the RSF (0.785), SVM (0.763), and CoxPH (0.736) models. Feature importance analysis highlighted femoral neck T-score, age, body mass index, operation time, compression fracture, and total calcium as significant predictors. Feature selection improved the c-index for the RSF model from 0.742 to 0.874 and CoxPH model from 0.717 to 0.915; the GB and SVM models exhibited a c-index decline post-feature selection. The GB and RSF models predicted lower rehospitalization probabilities than Kaplan–Meier estimates; the CoxPH model’s predictions were closely aligned with the observed data.ConclusionsThe effect of feature selection on model performance highlights the need for comprehensive variable selection and model evaluation strategies to improve predictive accuracy.

  • Research Article
  • 10.3389/fmed.2025.1713906
CT-based subchondral bone and clinical predictors of long-term total ankle arthroplasty outcomes
  • Jan 12, 2026
  • Frontiers in Medicine
  • Wei Ji + 1 more

ObjectiveThis study aimed to develop a machine learning-based predictive model for personalized long-term prognosis assessment in patients undergoing total ankle arthroplasty (TAA) by integrating preoperative computed tomography (CT)-derived subchondral bone structural parameters with clinical indicators.MethodsA retrospective cohort study involving 340 TAA patients was divided into training (n = 238, 70%) and validation (n = 102, 30%) sets through stratified random sampling, ensuring the outcome distribution was preserved. Radiographic features and clinical metrics were systematically collected. Univariate analysis was conducted to identify variables associated with poor prognosis in the training set, followed by feature reduction using the least absolute shrinkage and selection operator (LASSO) regression. To determine independent risk factors, multivariable COX proportional hazards regression (Cox regression) was used. Three machine learning models—Random Forest (RF), Support Vector Machine (SVM), and Gradient Boosting (GB)—were constructed using Python 3.8.5. Model performance was evaluated using receiver operating characteristic (ROC) curve analysis.ResultsBaseline characteristics showed no statistically significant differences between training and validation sets (p > 0.05). Univariate analysis indicated that subchondral bone mineral density (BMD), trabecular separation (Tb. Sp), talar tilt angle, Charlson Comorbidity Index (CCI), and preoperative talar necrosis volume were significantly associated with the need for prosthesis revision surgery. In the multivariable COX regression, Tb. Sp, talar tilt angle, and preoperative talar necrosis volume emerged as independent risk factors for sustained clinical deterioration. Conversely, subchondral BMD and CCI were identified as protective factors. In the validation set, the area under the ROC (AUC) for the RF, SVM, and GB models was 0.897, 0.790, and 0.815, respectively. Pairwise comparisons using the DeLong test revealed a statistically significant difference in AUC between the RF and SVM models (ΔAUC = 0.107, p = 0.032) and between the RF and GB models (ΔAUC = 0.082, p = 0.041). In contrast, the difference between the SVM and GB models was not statistically significant (ΔAUC = 0.025, p = 0.597).ConclusionThe RF model that incorporates preoperative CT-quantified subchondral bone parameters and clinical indicators effectively predicts long-term adverse outcomes in TAA patients. The top three predictive features identified are subchondral BMD, Tb. Sp, and preoperative talar necrosis volume.

  • Research Article
  • Cite Count Icon 10
  • 10.1139/cjfr-2020-0330
Predicting stand attributes of loblolly pine in West Gulf Coastal Plain using gradient boosting and random forests
  • Nov 17, 2020
  • Canadian Journal of Forest Research
  • X.W Lou + 6 more

Predicting future stand yield as a function of current stand conditions is important to forest managers. Two machine-learning techniques, gradient boosting (GB) and random forests (RF), were used to predict stand mean height of dominant and codominant trees (HT), trees per hectare (Tree·ha−1), and basal area per hectare (BA·ha−1) based on data sets collected from extensively and intensively managed loblolly pine (Pinus taeda L.) plantations in the West Gulf Coastal Plain region. Models were evaluated using coefficient of determination (R2) and bias by applying models to independent tests and validation data sets and then comparing to conventional statistical models (Coble-2017) currently being used in the region. For extensively managed plantations, the GB models had less bias than the RF models. For model precision (R2), the GB models were consistently better than the RF models, and the HT model was the best, followed by those of Tree·ha−1 and BA·ha−1. Even for BA·ha−1, the GB and RF models had R2 over 0.81. GB and RF models outperformed the Coble-2017 model; differences were not substantial for Tree·ha−1 but were significant for HT and BA·ha−1 (R2 = 0.96, 0.95, and 0.88 for HT and 0.84, 0.81, and 0.76 for BA). Important predictors identified by GB and RF and their contributions to the models were similar. For intensively managed plantations, GB and RF were similarly accurate in predicting HT and Tree·ha−1, but GB outperformed RF in predicting BA·ha−1 (R2 = 0.87 versus 0.75). We conclude that both GB and RF, although the former is preferred, can be effective in predicting future stand attributes. Forest managers can use the models presented here to predict quantitative information required for managing loblolly pine plantations in the region.

  • Research Article
  • 10.1007/s11739-025-04034-x
Validation of syncope short-term outcomes prediction by machine learning models in an Italian emergency department cohort.
  • Jul 16, 2025
  • Internal and emergency medicine
  • Alessandro Giaj Levra + 7 more

Machine learning (ML) algorithms have the potential to enhance the prediction of adverse outcomes in patients with syncope. Recently, gradient boosting (GB) and logistic regression (LR) models have been applied to predict these outcomes following a syncope episode, using the Canadian Syncope Risk Score (CSRS) predictors. This study aims to externally validate these models and compare their performance with novel models. We included all consecutive non-low-risk patients evaluated in the emergency department for syncope between 2015 and 2017 at six Italian hospitals. The GB and LR models were trained and tested using previously validated CSRS predictors. Additionally, recently developed deep learning (TabPFN) and large language models (TabLLM) were validated on the same cohort. The area under the curve (AUC), Matthews correlation coefficient (MCC), and Brier score (BS) were compared for each model. A total of 257 patients were enrolled, with a median age of 71years. Thirteen percent had adverse outcomes at 30days. The GB model achieved the best performance, with an AUC of 0.78, an MCC of 0.36, and a BS of 0.42. Significant performance differences were observed compared with the TabPFN model (p < 0.01) and the TabLLM model (p = 0.01). The GB model performed only slightly better than the LR model. The predictive capability of the GB and LR models using CSRS variables was reduced when validated in an external syncope cohort characterized by a higher event rate.

  • Research Article
  • 10.1016/j.joim.2025.11.003
Predicting traditional Chinese medicine constitutions in adults aged ≥ 65 years: A machine learning approach.
  • Jan 1, 2026
  • Journal of integrative medicine
  • Chen Sun + 9 more

Predicting traditional Chinese medicine constitutions in adults aged ≥ 65 years: A machine learning approach.

  • Research Article
  • 10.1097/01.ee9.0000609896.90483.c7
Comparison of Machine Learning Techniques for Spatio-Temporal Air Temperature Modelling using Earth Observation Satellites
  • Oct 1, 2019
  • Environmental Epidemiology
  • Schneider Dos Santos R + 1 more

OPS 03: Machine learning in environmental epidemiology, Room 315, Floor 3, August 26, 2019, 4:30 PM - 5:30 PM Background/Aim: Epidemiological studies use long-period air temperature series to quantify health risks related to heat. Typically, temperature is measured from meteorological stations, which have limitations in characterising its spatial patterns, due to landscape heterogeneity and sparseness of monitors. Analyses of satellites-remote sensing observations using machine learning methods (ML-methods) can overcome such limitations, but while several different models have been explored, little evidence exists about their individual and relative performance under the same study area. This study aims to compare alternative ML-methods to produce spatio-temporal predictions of maximum temperature (Tmax). Methods: Five ML-methods (Decision Tree (DT), Random Forest (RF), Gradient Boosting (GB), Support Vector Machine (SVM), and Neural Network (NN)) were investigated to predict London’s Tmax, using a data set from 12-summers (2006–2017) based on meteorological variables and temporal, spatial, and spatio-temporal predictors. The predictors’ rank-order was performed by Stepwise Linear (SL) regression, and the best group of predictors was split into 70:30 training/testing samples to validate each ML-method. Results: The optimal set of predictors was identified as land surface temperature, Julian day, elevation, normalised difference vegetation index, solar-zenith angle, distance from the coast, and longitude. The comparison across ML-methods indicated that the GB model performed the best, with R2=0.65 and root-mean-square error (RMSE) of 2.43°C. The RF, SVM, and NN had comparatively good performances, with R2 and RMSE ranging in 0.61-0.59 and 2.56°C-2.64°C, respectively. All these models improved the prediction obtained by the standard SL method. The DT model showed instead much lower predictive ability, with R2=0.30 and RMSE=3.44°C. Conclusion: This comparative analysis demonstrated the predictive power of ML-methods over SL-method. The GB model showed the best predictive ability, followed by RF, SVM, and NN, while the performance of the DT method is sub-optimal. Further research will assess the properties of alternative ML-methods in more general settings.

  • Research Article
  • 10.1371/journal.pone.0323949
Enhanced cardiovascular risk prediction in the Western Pacific: A machine learning approach tailored to the Malaysian population
  • Jun 17, 2025
  • PLOS One
  • Sazzli Kasim + 7 more

BackgroundCardiovascular disease (CVD) is a significant public health challenge in the Western Pacific region, including Malaysia.ObjectiveThis study aimed to develop and validate machine learning (ML) models to predict 10-year CVD risk in a Malaysian cohort, which could serve as a model for other Asian populations with similar genetic and environmental backgrounds.MethodsUtilizing data from the REDISCOVER Registry (5,688 participants from 2007 to 2017), 30 clinically relevant features were selected, and several ML algorithms were trained: Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Neural Network (NN) and Naive Bayes (NB). Ensemble model were also created using three commonly used meta learners, including RF, Generalized Linear Model (GLM), and Gradient Boosting Model (GBM). The dataset was split into a 70:30 train-test ratio, with 5-fold cross-validation to ensure robust performance. Model evaluation was primarily based on the Area Under the Curve (AUC), with additional metrics such as sensitivity, specificity, and the Net Reclassification Index (NRI) to compare the ML models against traditional risk scores like the Framingham Risk Score (FRS) and Revised Pooled Cohort Equations (RPCE).ResultsThe LR model achieved the highest AUC of 0.77, outperforming the FRS (AUC = 0.72) and RPCE (AUC = 0.74). The ensemble model provided robust performance, though it did not significantly exceed the best individual model. SHAP (SHapley Additive exPlanations) analysis identified key predictors such as systolic blood pressure, weight and waist circumference. The study showed a significant NRI improvement of 13.15% compared to the FRS and 7.00% compared to the RPCE, highlighting the potential of ML approaches to enhance CVD risk prediction in Malaysia. The best-performing model was deployed on a web platform for real-time use, ensuring ongoing validation and clinical applicability.ConclusionsThese findings underscore the effectiveness of ML models in improving CVD risk stratification and decision-making in Malaysia and beyond.

  • Research Article
  • 10.1111/iej.70106
Machine Learning Model for Predicting Postoperative Pain in Cases of Irreversible Pulpitis.
  • Feb 1, 2026
  • International endodontic journal
  • Pedro Felipe De Jesus Freitas + 9 more

Postoperative pain is a frequent clinical concern following endodontic treatment. This study aimed to develop and validate supervised machine learning models to predict the occurrence of postoperative pain in cases of irreversible pulpitis. A prospective sample of 354 patients aged 18 to 60 years undergoing standardised endodontic treatment was analysed. In the original randomised clinical trials from which the data were derived, each patient had only one eligible tooth included. Clinical variables included postoperative pain at 24 and 72 h, treated tooth group, occlusal reduction, photobiomodulation therapy, use of non-steroidal anti-inflammatory drugs (NSAIDs), sex and age. Eight supervised machine learning algorithms were trained to predict pain occurrence, including Logistic Regression, Support Vector Machine, Gradient Boosting, Random Forest, Decision Tree, K-Nearest Neighbours, AdaBoost and Multilayer Perceptron. The dataset was divided into training (70%) and testing (30%) sets using stratified sampling. Class imbalance in the training set, characterised by a lower proportion of cases with moderate or severe pain, was addressed using the Synthetic Minority Oversampling Technique. Hyperparameters were optimised through grid search combined with stratified five-fold cross validation. Model performance was evaluated using the area under the curve (AUC), accuracy, precision, recall and F1-score, with 95% confidence intervals estimated by bootstrapping. The predictive models achieved good discrimination of pain outcomes. Logistic Regression showed the best test performance at 24 h (AUC 0.74 [95% CI: 0.61 to 0.85], precision 0.81 [95% CI: 0.73 to 0.88]). At 72 h, the Support Vector Machine achieved the highest performance (AUC 0.81 [95% CI: 0.69 to 0.92], precision 0.88 [95% CI: 0.79 to 0.94]). Age and sex emerged as the most influential predictors across models. Supervised machine learning models demonstrated promising performance for predicting postoperative pain following endodontic treatment. Logistic Regression and Support Vector Machine algorithms presented the most consistent results, supporting their potential clinical application for personalised pain management.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 43
  • 10.3390/app12052280
A Comparative Assessment of Machine Learning Models for Landslide Susceptibility Mapping in the Rugged Terrain of Northern Pakistan
  • Feb 22, 2022
  • Applied Sciences
  • Naeem Shahzad + 2 more

This study investigated the performances of different techniques, including random forest (RF), support vector machine (SVM), maximum entropy (maxENT), gradient-boosting machine (GBM), and logistic regression (LR), for landslide susceptibility mapping (LSM) in the rugged terrain of northern Pakistan. Initially, a landslide inventory of 200 samples was produced along with an additional 200 samples indicating nonlandslide areas and divided into training (70%) and validation (30%) groups using a stratified loop-based random sampling approach. Then, a geospatial database of 12 possible landslide influencing factors (LIFs) was generated, including elevation, slope, aspect, topographic wetness index (TWI), topographic position index (TPI), distance to drainage, distance to fault, distance to road, normalized difference vegetation index (NDVI), rainfall, land cover/land use (LCLU), and a geological map of the study area. None of the LIFs were redundant for the modeling, as indicated by the multicollinearity test (tolerance &gt; 0.1) and information gain ratio (IGR &gt; 0). We extended the evaluation measures of each algorithm from area-under-the-curve (AUC) analysis to the calculation of performance overall (POA) with the help of precision, recall, F1 score, accuracy (ACC), and Matthew’s correlation coefficient (MCC). The results showed that the SVM was the most promising model (AUC = 0.969, POA = 2669) for the LSM, followed by RF (AUC = 0.967, POA = 2656), GBM (AUC = 0.967, POA = 2623), maxENT (AUC = 0.872, POA = 1761), and LR (AUC = 0.836, POA = 1299). It is important to note that the SVM, RF, and GBM were the top performers, with almost similar accuracy. Thus, each of these could be equally effective for LSM and can be used for risk reduction and mitigation measures in the rugged terrain of Pakistan and other regions with similar topography.

  • Research Article
  • Cite Count Icon 2
  • 10.21873/anticanres.17619
Four Different Artificial Intelligence Models Versus Logistic Regression to Enhance the Diagnostic Accuracy of Fecal Immunochemical Test in the Detection of Colorectal Carcinoma in a Screening Setting.
  • May 27, 2025
  • Anticancer research
  • Maaret Eskelinen + 5 more

This study aimed to evaluate the diagnostic accuracy (DA) of four artificial intelligence (AI) models compared to logistic regression (LR) in enhancing the performance of the fecal immunochemical test (FIT) for the detection of colorectal carcinoma (CRC). The study cohort comprised 544 patients with colorectal neoplasia (CRN), including 58 CRC and 486 non-CRC cases, recruited from the Barretos Cancer Hospital. Each patient provided three consecutive fecal samples, which were analyzed using two fecal occult blood (FOB) assays: ColonView-FIT (CV) and HemoccultSENSA. Four AI models - gradient boosting machine (GBM), neural network (NN), random forest (RF), and support vector machine (SVM) - were developed, incorporating clinical features and CV results. Diagnostic performance was assessed via hierarchical summary receiver operating characteristic (HSROC) curves. In conventional analysis, the area under the curve (AUC) values for different AI models ranged from 0.926 to 0.977, while the highest AUC values were reached by gradient boosting machine (GBM), neural network (NN), and random forest (RF) models (0.974, 0.976 and 0.977, respectively). In the HSROC analysis, the AUC values for i) 'low risk' variables, ii) 'high risk' variables, and iii) AI models were as follows: i) AUC=0.503 (95% CI=0.390-0.613), ii) AUC=0.773 (95% CI=0.713-0.837), and iii) AUC=0.958 (95% CI=0.930-0.989). In all comparisons of the AUC values, the difference was highly significant (p<0.0001). AI models outperformed conventional LR and non-AI diagnostic features in improving FIT-based CRC screening. This is the first study to show that combining clinical data with FIT results in AI frameworks can significantly improve diagnostic accuracy in CRC screening.

  • Research Article
  • Cite Count Icon 58
  • 10.3802/jgo.2019.30.e65
Prediction of survival outcomes in patients with epithelial ovarian cancer using machine learning methods
  • Mar 11, 2019
  • Journal of Gynecologic Oncology
  • E Sun Paik + 9 more

ObjectivesThe aim of this study was to develop a new prognostic classification for epithelial ovarian cancer (EOC) patients using gradient boosting (GB) and to compare the accuracy of the prognostic model with the conventional statistical method.MethodsInformation of EOC patients from Samsung Medical Center (training cohort, n=1,128) was analyzed to optimize the prognostic model using GB. The performance of the final model was externally validated with patient information from Asan Medical Center (validation cohort, n=229). The area under the curve (AUC) by the GB model was compared to that of the conventional Cox proportional hazard regression analysis (CoxPHR) model.ResultsIn the training cohort, the AUC of the GB model for predicting second year overall survival (OS), with the highest target value, was 0.830 (95% confidence interval [CI]=0.802–0.853). In the validation cohort, the GB model also showed high AUC of 0.843 (95% CI=0.833–0.853). In comparison, the conventional CoxPHR method showed lower AUC (0.668 (95% CI=0.617–0.719) for the training cohort and 0.597 (95% CI=0.474–0.719) for the validation cohort) compared to GB. New classification according to survival probability scores of the GB model identified four distinct prognostic subgroups that showed more discriminately classified prediction than the International Federation of Gynecology and Obstetrics staging system.ConclusionOur novel GB-guided classification accurately identified the prognostic subgroups of patients with EOC and showed higher accuracy than the conventional method. This approach would be useful for accurate estimation of individual outcomes of EOC patients.

  • Research Article
  • 10.1200/jco.2025.43.5_suppl.647
Machine learning model integrating CT radiomics and circulating microRNAs to predict residual disease histology in metastatic non-seminoma testicular cancer (mNSTC).
  • Feb 10, 2025
  • Journal of Clinical Oncology
  • Guliz Ozgun + 14 more

647 Background: The primary treatment of most mNSTC is chemotherapy followed by surgery if the residual disease (RD) is &gt;1 cm. However, conventional imaging lacks the specificity to characterize the tissue, often leading to overtreatment. This study hypothesizes that integrating CT-driven radiomics features with plasma miR371 and miR375 will enhance the predictive accuracy of Machine Learning (ML) models to predict teratoma, viable germ cell (vGCT) and fibrosis/necrosis (F/N) in mNSTC patients with RD. Methods: 111 lesions from52 patients, including residual teratoma (n=57), F/N (n=33), vGCT (n=10), and additional seminoma (n=11) for training purposes were included, split into training (N=78) and test cohorts (N=33). Lesions were lymph nodes (n=87), lung (n=21), and brain (n=3) with a median size of 1.6 cm (Q1-Q3 interval=1.2-2.73 cm). 3D Slicer version 5.6.1 was used to segment the RD &gt; 1 cm (short axis) and extract radiomics features. Plasma miRNA levels before resection were measured by RT-PCR. Random Forest (RF), Support Vector Machine (SVM), Gradient Boosting (GB), and CatBoost (CB) ML models were evaluated to define the operating characteristics of radiomics alone (R-only) and in combination with miR371 (371) and/or miR375 (375) levels in predicting teratoma, vGCT and F/N. Results: For predicting teratoma, the best models were RF (R+375 and R+371+375), CB (R+371+375), and GB (R+371 and R+371+375). While adding miR371 or miR375 to R-only slightly improved AUC across models, the best results were achieved with the R+375+371 dataset. CB achieved AUCs ranging from 0.94 to 0.97 in training and 0.81 to 0.93 in test sets, with its highest AUC of 0.93 (95% CI: 0.78-0.97) on the R+375+371 dataset to differentiate all three classes. Similarly, GB demonstrated strong performance, achieving its highest AUC of 0.93 (95% CI: 0.79-0.96) on the R+375+371 dataset (Table). Conclusions: Integration of plasma miR371, miR375 and radiomics improved accuracy of predicting histologies across all ML models. These methods could be used to characterize the histology of RD in mNSTC patients to better inform treatment decisions. Further refinement, including incorporation of histological findings of the primary tumor, will be reported. AUC values of different ML algorithms on training and test sets. TRAINING SET TEST SET Model ±SD R R+375 R+371 R+375+371 Model (95% CI) R R+375 R+371 R+375+371 RF 0.93±0.05 0.95±0.04 0.95±0.03 0.96±0.04 RF 0.8(0.59-0.89) 0.85(0.72-0.93) 0.87(0.76-0.95) 0.91(0.78-0.95) SVM 0.84±0.06 0.84±0.09 0.89±0.11 0.89±0.09 SVM 0.72(0.54-0.80) 0.74(0.56-0.82) 0.83(0.69-0.92) 0.84(0.76-0.94) GB 0.94±0.04 0.91±0.08 0.95±0.05 0.97±0.03 GB 0.84(0.61-0.96) 0.89(0.77-0.97) 0.89(0.79-0.96) 0.93(0.79-0.96) CB 0.95±0.03 0.94±0.03 0.94±0.04 0.97±0.03 CB 0.81(0.6-0.93) 0.86(0.73-0.94) 0.89(0.78-0.97) 0.93(0.78-0.97)

  • Research Article
  • Cite Count Icon 1
  • 10.21037/jtd-24-1067
Diagnostic artificial intelligence model predicts lymph node status in non-small cell lung cancer using simplified examination.
  • Nov 1, 2024
  • Journal of thoracic disease
  • Ryuichi Yoshimura + 7 more

Artificial intelligence (AI) technology was introduced in medical data area and applied disease prediction models. This study aimed to establish an AI model for predicting lymph node metastasis based on simple medical examinations in patients with non-small cell lung cancer (NSCLC). We retrospectively analyzed 988 patients with NSCLC who underwent radical pulmonary resection with mediastinal lymph node dissection between January 2011 and October 2022. We collected clinical characteristics including age, sex, smoking history, tumor marker levels, tumor side, segment location, total tumor size, solid tumor size and consolidation-to-tumor ratio, obtainable from medical interview, blood tests and plain computed tomography (CT) of the chest. All patients were randomly classified into a training set (n=790) and a validation set (n=198). Six algorithms including Support Vector Classification (SVC), k-nearest neighbor algorithm (k-NN), logistic regression (LR), random forest (RF), gradient boosting (GB) and multilayer perceptron (MLP) were created to decide the lymph node metastasis. The GB model showed the best diagnostic performance, with 80.0% accuracy, 95.6% specificity and an area under the curve (AUC) of 0.75. An AI model showed high specificity and accuracy for predicting lymph node metastasis. These models have potential to categorize suitable surgical procedures for NSCLC patients without needing contrast-enhanced CT or positron emission tomography.

  • Research Article
  • Cite Count Icon 1
  • 10.1038/s41598-025-15791-2
Machine learning algorithms for voltage stability assessment in electrical distribution systems
  • Aug 30, 2025
  • Scientific Reports
  • Molla Addisu Mossie + 4 more

Voltage instability poses a significant challenge by limiting power system operation and transmission capacity. Rapid detection and effective corrective actions are essential to prevent voltage collapse. However, traditional methods for assessing voltage security margins are computationally intensive and often impractical for real-time applications. This study addresses voltage stability assessment in power systems using machine learning (ML) to overcome the computational limitations of traditional methods. By employing Linear Regression (LR), Random Forest (RF), Gradient Boosting (GB), and Support Vector Machine (SVM), we predict Fast Voltage Stability Indices (FVSI) at nominal load as well as under varying loads (10–150%) in 15 kV Ethiopian distribution networks: a 35-bus Bata feeder system and a 53-bus Papyrus feeder system. RF and GB models achieved superior accuracy with R² values of 0.999 and 0.9998 respectively, significantly outperforming LR and SVM which exhibited substantial deviations. The GB model achieves the highest accuracy, with RMSE values of 0.0002 (53-bus) and 2.419e-05 (35-bus), while RF yields RMSE values of 0.0039 (53-bus) and 0.00120 (35-bus), demonstrating strong predictive performance. The FVSI threshold analysis revealed critical stability limits, with values approaching 1.0 indicating proximity to voltage collapse. The analysis identified buses 36, 32, and 21 in the 53-bus system (FVSI values: 0.087, 0.082, and 0.080) and buses 27 and 16 in the 35-bus system (FVSI values: 0.085 and 0.082) as critical instability risk points requiring immediate monitoring. These findings underscore the efficacy of ensemble methods for rapid voltage stability assessment and emphasize the need for targeted interventions in high-risk areas to bolster grid resilience in Ethiopian distribution networks.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.