Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Machine learning approaches for improving load-dependent bearing fault diagnosis of nuclear facility components

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

ABSTRACT The nuclear industry is exploring applications of machine learning, including autonomous control and management of reactors and nuclear power plant and their components. The accurate diagnosis and classification of motor bearing faults under diverse operating conditions remain a significant challenge due to the complex nature of signal patterns, overlapping features, and dynamic environments. This study presents a comprehensive comparative analysis of multiple machine learning algorithms, including Random Forest, Gradient Boosting Machine, Decision Tree, Support Vector Machine (SVM), and K-Nearest Neighbours (KNN), applied to the HUST bearing dataset. This work also underscores the importance of load-dependent analysis, as fault signatures in bearings vary significantly with operating conditions. Experimental results indicate that ensemble models, particularly Random Forest, deliver superior performance across both binary and multi-class classification tasks, achieving up to 99.70% accuracy for 7-class cases and 99.37% for 21-class scenarios. Performance metrics further highlight the Random Forest model’s robustness, achieving 99.37% precision, recall, and F1-score with 15 features, confirming its suitability for real-time predictive maintenance applications. This study emphasises the importance of appropriate segmentation strategies and model selection, offering a reliable and scalable framework for industrial fault diagnostics and condition monitoring systems.

Similar Papers
  • Research Article
  • 10.1002/cpe.70325
Soil Nutrient Analysis and Yield Prediction With Neuro‐ ML Ensemble Model Using IoT ‐ WSN Approach: In Context to India's Agricultural Sector
  • Oct 21, 2025
  • Concurrency and Computation: Practice and Experience
  • Sandeep Bhatia + 2 more

Agriculture is a backbone of the Indian economy and people's lives. In agriculture land, soil is the most important element on which the quality of production and efficiency depends to the maximum extent. Phosphorus (P), Nitrogen (N), Potassium (K), and the potential of hydrogen (pH) are the key nutrients in soil. An efficient crop recommender and prediction system is needed to optimize agriculture practices considering the escalating demand for more food. Traditional time‐consuming and manual farming should be replaced with a smart agriculture framework using the integration of technologies like the Internet of Things (IoT), Wireless Sensor Network (WSN), and Machine Learning (ML). This paper proposed an IoT‐WSN driven crop management system with Neuro‐ML Ensemble Model, utilizing LoRaWAN Gateway, that can be deployed in the agriculture field to collect real‐time soil parameters. In this paper for soil nutrient analysis, the author used various ML algorithms such as Naive Bayes (NB), Logistic Regression (LR), K‐Nearest Neighbor (KNN), Decision Tree (DT), Random Forest (RF), Ada Boost (AB), Gradient Boosting (GB), and Support Vector Machine (SVM) and recommending a suitable ML algorithm for the crop recommender system. For crop yield prediction, the author has developed and recommended a customized GB Algorithm with an accuracy of 98.80%, and for the fertilizer recommendation system, the author has suggested CNN‐BiGRU which outperforms other approaches like BiGRU and CNN with an average accuracy rate of 92.48%. The author presented work with respect to the Indian agriculture sector and compared ML algorithms with state‐of‐the‐art datasets available on some government websites of India, and used by other authors, with a dataset collected by the author from hardware using Raspberry Pi. For crop recommendation and forecasting, the Neuro‐ML Ensemble model employs the Neuro‐ML, which combines neural networks (NN) with the ML models. This research aspires to assist farmers in opting for suitable crops as per their environmental suitability and situation by analyzing and predicting which crops suit well to fit the parameters required to enhance crop growth like soil nutrients, soil moisture, soil pH, and rainfall, etc. The author obtained accuracy for various ML models used in the framework. For NB, LR, KNN, SVM, DT, and RF, the author obtained accuracies of 99.54%, 96.36%, 95.90%, 96.81%, 98.86%, and 99.31%, respectively, using the Kaggle dataset available as open access. Through a dataset collected by the authors, we obtained accuracies of 94.54%, 91.36%, 92.72%, 92.73%, 86.36%, and 94.54% for NB, LR, KNN, SVM, DT, and RF, respectively. The author found that Naive Bayes (NB) outperforms the other machine learning algorithms, such as KNN, SVM, LR, Decision Tree, RF, and AB, and is the best algorithm suited for crop yield.

  • Preprint Article
  • 10.5194/egusphere-egu25-10556
Machine Learning for High-Accuracy Co-Seismic Landslide Risk Prediction Using Multi-Parametric Data: A Case Study of M7.2 Hualien Earthquake
  • Mar 18, 2025
  • Yu Hsuan Ou Yang + 2 more

Taiwan, situated at the junction of the Ryukyu Arc and the Philippine Arc, is prone to frequent seismic activities due to its position at the boundary of tectonic plates. Earthquake-induced landslides, therefore, are one of the most common geological hazards. For disaster mitigation, it is crucial to accurately predict the spatial distribution of such landslides after earthquake occurrence. This study revolves around assessing the landslide risks triggered by the April 3rd, 2024, Hualien earthquake, which caused tremendous damage and claimed 18 lives, using multiple machine learning models, including Random Forest (RF), Support Vector Machines (SVM), Gradient Boosting Machine (GBM), and K-Nearest Neighbors (KNN). However, Logistic Regression (LR) was undiscussed in this study due to its disaster prediction limitations. While LR is advantageous when handling small datasets with limited independent variables, it faces significant drawbacks in high-dimensional and multi-variable scenarios. Moreover, the simplistic structure of LR tends to result in underfitting, causing inferior predictive performance. Furthermore, when dealing with large-scale data, the process becomes computationally intensive for LR. In contrast, machine learning models like RF, SVM, and GBM, along with ensemble techniques, are better suited for addressing the complexity of earthquake-induced landslide prediction.The models were trained using a dataset comprising 3191 data points, including various topographic, geological, and seismic variables such as slope-related factors, curvature, elevation, aspect, lithology, peak ground acceleration (PGA), peak ground velocity (PGV), and distances to nearby faults and rivers. The dataset was labeled into two categories: coseismic landslide (CL) data labeled as 1 and non-coseismic landslide (NCL) data labeled as 0. To train and evaluate the models, the dataset was divided into two subsets: 70% was used as the training set to build and fine-tune the models, while the remaining served as the test set to assess their predictive performance. The confusion matrices of the four models were the basis for comparing their performance. All models’ accuracy exceeds 0.95. Among them, the SVM model reached the highest at 0.9822, followed by GBM (0.9702), RF (0.9697), and KNN (0.9530). The greater performance of SVM can be attributed to its ability to handle high-dimensional and nonlinear data more effectively, using kernel functions to transform the feature space and maximize the margin between classes, enhancing its classification precision and generalization capability.To further enhance prediction reliability, an ensemble model was developed by integrating the RF, SVM, and GBM models, while the KNN model, showing the lowest accuracy, was excluded, ensuring the number of the models was odd. The final prediction of the ensemble model was voted by the outcome of the three models, substantially reducing prediction errors.Compared to logistic regression models, the ensemble approach is more dependable. While logistic regression struggles with high-dimensional, non-linear, and strongly correlated geophysical variables, the ensemble model formed by three machine learning models (RF, SVM, and GBM) combines their strengths to tackle these challenges. By leveraging the models’ diversity, the ensemble reduces overfitting and enhances the robustness of predictions, highlighting the ensemble model’s capability in addressing the complexities of coseismic landslide prediction.

  • Research Article
  • 10.1016/j.jrmge.2025.10.013
Data-driven prediction of strength in cement-treated clayey soils
  • Dec 1, 2025
  • Journal of Rock Mechanics and Geotechnical Engineering
  • Siau Chen Chian + 1 more

Data-driven prediction of strength in cement-treated clayey soils

  • Research Article
  • 10.1200/jco.2025.43.5_suppl.647
Machine learning model integrating CT radiomics and circulating microRNAs to predict residual disease histology in metastatic non-seminoma testicular cancer (mNSTC).
  • Feb 10, 2025
  • Journal of Clinical Oncology
  • Guliz Ozgun + 14 more

647 Background: The primary treatment of most mNSTC is chemotherapy followed by surgery if the residual disease (RD) is >1 cm. However, conventional imaging lacks the specificity to characterize the tissue, often leading to overtreatment. This study hypothesizes that integrating CT-driven radiomics features with plasma miR371 and miR375 will enhance the predictive accuracy of Machine Learning (ML) models to predict teratoma, viable germ cell (vGCT) and fibrosis/necrosis (F/N) in mNSTC patients with RD. Methods: 111 lesions from52 patients, including residual teratoma (n=57), F/N (n=33), vGCT (n=10), and additional seminoma (n=11) for training purposes were included, split into training (N=78) and test cohorts (N=33). Lesions were lymph nodes (n=87), lung (n=21), and brain (n=3) with a median size of 1.6 cm (Q1-Q3 interval=1.2-2.73 cm). 3D Slicer version 5.6.1 was used to segment the RD > 1 cm (short axis) and extract radiomics features. Plasma miRNA levels before resection were measured by RT-PCR. Random Forest (RF), Support Vector Machine (SVM), Gradient Boosting (GB), and CatBoost (CB) ML models were evaluated to define the operating characteristics of radiomics alone (R-only) and in combination with miR371 (371) and/or miR375 (375) levels in predicting teratoma, vGCT and F/N. Results: For predicting teratoma, the best models were RF (R+375 and R+371+375), CB (R+371+375), and GB (R+371 and R+371+375). While adding miR371 or miR375 to R-only slightly improved AUC across models, the best results were achieved with the R+375+371 dataset. CB achieved AUCs ranging from 0.94 to 0.97 in training and 0.81 to 0.93 in test sets, with its highest AUC of 0.93 (95% CI: 0.78-0.97) on the R+375+371 dataset to differentiate all three classes. Similarly, GB demonstrated strong performance, achieving its highest AUC of 0.93 (95% CI: 0.79-0.96) on the R+375+371 dataset (Table). Conclusions: Integration of plasma miR371, miR375 and radiomics improved accuracy of predicting histologies across all ML models. These methods could be used to characterize the histology of RD in mNSTC patients to better inform treatment decisions. Further refinement, including incorporation of histological findings of the primary tumor, will be reported. AUC values of different ML algorithms on training and test sets. TRAINING SET TEST SET Model ±SD R R+375 R+371 R+375+371 Model (95% CI) R R+375 R+371 R+375+371 RF 0.93±0.05 0.95±0.04 0.95±0.03 0.96±0.04 RF 0.8(0.59-0.89) 0.85(0.72-0.93) 0.87(0.76-0.95) 0.91(0.78-0.95) SVM 0.84±0.06 0.84±0.09 0.89±0.11 0.89±0.09 SVM 0.72(0.54-0.80) 0.74(0.56-0.82) 0.83(0.69-0.92) 0.84(0.76-0.94) GB 0.94±0.04 0.91±0.08 0.95±0.05 0.97±0.03 GB 0.84(0.61-0.96) 0.89(0.77-0.97) 0.89(0.79-0.96) 0.93(0.79-0.96) CB 0.95±0.03 0.94±0.03 0.94±0.04 0.97±0.03 CB 0.81(0.6-0.93) 0.86(0.73-0.94) 0.89(0.78-0.97) 0.93(0.78-0.97)

  • Research Article
  • Cite Count Icon 3
  • 10.3389/fcvm.2021.741679
Machine Learning Algorithms to Detect Sex in Myocardial Perfusion Imaging.
  • Oct 29, 2021
  • Frontiers in cardiovascular medicine
  • Érito Marques De Souza Filho + 10 more

Myocardial perfusion imaging (MPI) is an essential tool used to diagnose and manage patients with suspected or known coronary artery disease. Additionally, the General Data Protection Regulation (GDPR) represents a milestone about individuals' data security concerns. On the other hand, Machine Learning (ML) has had several applications in the most diverse knowledge areas. It is conceived as a technology with huge potential to revolutionize health care. In this context, we developed ML models to evaluate their ability to distinguish an individual's sex from MPI assessment. We used 260 polar maps (140 men/120 women) to train ML algorithms from a database of patients referred to a university hospital for clinically indicated MPI from January 2016 to December 2018. We tested 07 different ML models, namely, Classification and Regression Tree (CART), Naive Bayes (NB), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Adaptive Boosting (AB), Random Forests (RF) and, Gradient Boosting (GB). We used a cross-validation strategy. Our work demonstrated that ML algorithms could perform well in assessing the sex of patients undergoing myocardial scintigraphy exams. All the models had accuracy greater than 82%. However, only SVM achieved 90%. KNN, RF, AB, GB had, respectively, 88, 86, 85, 83%. Accuracy standard deviation was lower in KNN, AB, and RF (0.06). SVM and RF had had the best area under the receiver operating characteristic curve (0.93), followed by GB (0.92), KNN (0.91), AB, and NB (0.9). SVM and AB achieved the best precision. Our results bring some challenges regarding the autonomy of patients who wish to keep sex information confidential and certainly add greater complexity to the debate about what data should be considered sensitive to the light of the GDPR.

  • Research Article
  • 10.55640/ijbms-04-11-02
COMPARATIVE PERFORMANCE ANALYSIS OF MACHINE LEARNING ALGORITHMS FOR BUSINESS INTELLIGENCE: A STUDY ON CLASSIFICATION AND REGRESSION MODELS
  • Nov 28, 2024
  • International journal of business and management sciences
  • Md Nad Vi Al Bony

This study presents a comparative analysis of five widely used machine learning algorithms—Logistic Regression, Support Vector Machines (SVM), Random Forest, Gradient Boosting, and Neural Networks—in the context of business intelligence (BI). The performance of these models was evaluated on both classification and regression tasks, utilizing a comprehensive set of metrics including accuracy, precision, recall, F1 score, AUC-ROC for classification, and R-squared for regression. Results indicate that ensemble models, particularly Random Forest and Gradient Boosting, outperformed other algorithms across both tasks. Random Forest achieved the highest AUC-ROC (96.3%) in classification, while Gradient Boosting led with the highest F1 score (94.2%) and AUC-ROC (97.8%), reflecting its ability to model complex, non-linear relationships. In regression tasks, Gradient Boosting (R² = 0.94) and Random Forest (R² = 0.91) demonstrated superior explanatory power. While Neural Networks (R² = 0.93) performed well, their computational complexity and lack of interpretability pose challenges for certain BI applications. Logistic Regression and SVM, though effective in simpler contexts, were generally outperformed by more complex models. The findings emphasize the importance of selecting the appropriate model based on the business objectives, data characteristics, and computational resources, with ensemble methods being ideal for high-accuracy, complex BI tasks. This study contributes valuable insights for organizations aiming to leverage machine learning for data-driven decision-making and enhances the understanding of algorithmic trade-offs in business intelligence.

  • Research Article
  • Cite Count Icon 1
  • 10.37547/marketing-fmmej-04-11-06
COMPARATIVE PERFORMANCE ANALYSIS OF MACHINE LEARNING ALGORITHMS FOR BUSINESS INTELLIGENCE: A STUDY ON CLASSIFICATION AND REGRESSION MODELS
  • Nov 28, 2024
  • Frontline Marketing, Management and Economics Journal
  • Md Nad Vi Al Bony + 8 more

This study presents a comparative analysis of five widely used machine learning algorithms—Logistic Regression, Support Vector Machines (SVM), Random Forest, Gradient Boosting, and Neural Networks—in the context of business intelligence (BI). The performance of these models was evaluated on both classification and regression tasks, utilizing a comprehensive set of metrics including accuracy, precision, recall, F1 score, AUC-ROC for classification, and R-squared for regression. Results indicate that ensemble models, particularly Random Forest and Gradient Boosting, outperformed other algorithms across both tasks. Random Forest achieved the highest AUC-ROC (96.3%) in classification, while Gradient Boosting led with the highest F1 score (94.2%) and AUC-ROC (97.8%), reflecting its ability to model complex, non-linear relationships. In regression tasks, Gradient Boosting (R² = 0.94) and Random Forest (R² = 0.91) demonstrated superior explanatory power. While Neural Networks (R² = 0.93) performed well, their computational complexity and lack of interpretability pose challenges for certain BI applications. Logistic Regression and SVM, though effective in simpler contexts, were generally outperformed by more complex models. The findings emphasize the importance of selecting the appropriate model based on the business objectives, data characteristics, and computational resources, with ensemble methods being ideal for high-accuracy, complex BI tasks. This study contributes valuable insights for organizations aiming to leverage machine learning for data-driven decision-making and enhances the understanding of algorithmic trade-offs in business intelligence.

  • Research Article
  • 10.35854/1998-1627-2025-3-348-358
The application of machine learning algorithms for forecasting the quality of life index of the population
  • Apr 24, 2025
  • Economics and Management
  • Kh I Aminov + 1 more

Aim. The work aimed to investigate the possibilities of applying various machine learning algorithms to forecast the quality of life index of the population. Objectives. The work seeks to develop predictive models for analyzing the quality of life index of the population of selected countries (Germany, India, the Netherlands, Russia) using various machine learning algorithms based on historical data from the Numbeo website from 2012 to 2025; as well as to systematize and analyze the results of machine learning models for these countries. Methods. The study used machine learning models such as random forest, linear regression, gradient boosting, k-nearest neighbors, and support vector machine. Forecasting the quality of life index of the population is based on data on socio-economic factors for various countries presented in the Numbeo database. Results. A comparative analysis of the results of forecasting the quality of life index of the population of selected countries was performed using machine learning algorithms based on historical data from 2012 to 2025. Particular attention is paid to adjusting the hyperparameters of the models and cross-validation to improve the accuracy of predictions. The analysis demonstrated that the most reliable results can be obtained using an ensemble of machine learning models without taking into account linear regression forecasts. Conclusion. The calculations performed revealed that the gradient boosting model demonstrates the best results. However, in order to improve accuracy and reduce deviations, it is recommended to use an ensemble of models. The use of machine learning in forecasting offers new opportunities for the development of social government programs aimed at improving the quality of life of the population.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 34
  • 10.1002/hsr2.962
Application of machine learning methods in predicting schizophrenia and bipolar disorders: A systematic review
  • Dec 28, 2022
  • Health Science Reports
  • Mahdieh Montazeri + 4 more

Background and AimSchizophrenia and bipolar disorder (BD) are critical and high‐risk inherited mental disorders with debilitating symptoms. Worldwide, 3% of the population suffers from these disorders. The mortality rate of these patients is higher compared to other people. Current procedures cannot effectively diagnose these disorders because it takes an average of 10 years from the onset of the first symptoms to the definitive diagnosis of the disease. Machine learning (ML) techniques are used to meet this need. This study aimed to summarize information on the use of ML techniques for predicting schizophrenia and BD to help early and timely diagnosis of the disease.MethodsA systematic literature search included articles published until January 19, 2020 in 3 databases. Two reviewers independently assessed original papers to determine eligibility for inclusion in this review. PRISMA guidelines were followed to conduct the study, and the Prediction Model Risk of Bias Assessment Tool (PROBAST) to assess included papers.ResultsIn this review, 1243 papers were retrieved through database searches, of which 15 papers were included based on full‐text assessment. ML techniques were used to predict schizophrenia and BDs. The main algorithms applied were support vector machine (SVM) (10 studies), random forests (RF) (5 studies), and gradient boosting (GB) (3 studies). Input and output characteristics were very diverse and have been kept to enable future research. RFs algorithms demonstrated significantly higher accuracy and sensitivity than SVM and GB. GB demonstrated significantly higher specificity than SVM and RF. We found no significant difference between RF and SVM in terms of specificity.ConclusionML can precisely predict results and assist in making clinical decisions‐concerning schizophrenia and BD. RF often performed better than other algorithms in supervised learning tasks. This study identified gaps in the literature and opportunities for future psychological ML research.

  • Research Article
  • Cite Count Icon 20
  • 10.1088/1402-4896/ad562a
A machine learning ensemble approach for predicting solar-sensitive hybrid photocatalysts on hydrogen evolution
  • Jun 20, 2024
  • Physica Scripta
  • Rezan Bakır + 2 more

Hydrogen, as the lightest and most abundant element in the universe, has emerged as a pivotal player in the quest for sustainable energy solutions. Its remarkable properties, such as high energy density and zero emissions upon combustion, make it a promising candidate for addressing the pressing challenges of climate change and transitioning towards a clean and renewable energy future. In an effort to improve efficiency and reduce experimental costs, we adopted machine learning techniques in this study. Our focus turned to predictive analyses of hydrogen evolution values using three photocatalysts, namely, graphene-supported LaFeO3 (GLFO), graphene-supported LaRuO3 (GLRO), and graphene-supported BiFeO3 (GBFO), examining their correlation with varying levels of pH, catalyst amount, and H2O2 concentration. To achieve this, a diverse range of machine learning models are used, including Random Forest (RF), Decision Tree (DT), Support Vector Machine (SVM), XGBoost, Gradient Boosting, and AdaBoost—each bringing its strengths to the predictive modeling arena. An important step involved combining the most effective models—Random Forests, Gradient Boosting, and XGBoost—into an ensemble model. This collaborative approach aimed to leverage their collective strengths and improve overall predictability. The ensemble model emerged as a powerful tool for understanding photocatalytic hydrogen evolution. Standard metrics were employed to assess the performance of our ensemble prediction model, encompassing R squared, Root Mean Squared Error (RMSE), Mean Squared Error (MSE), and Mean Absolute Error (MAE). The yielded results showcase exceptional accuracy, with R squared values of 96.9%, 99.3%, and 98% for GLFO, GBFO, and GLRO, respectively. Moreover, our model demonstrates minimal error rates across all metrics, underscoring its robust predictive capabilities and highlighting its efficacy in accurately forecasting the intricate relationships between GLFO, GBFO, and GLRO values and their influencing factors.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 8
  • 10.14569/ijacsa.2019.0100808
Mortality Prediction based on Imbalanced New Born and Perinatal Period Data
  • Jan 1, 2019
  • International Journal of Advanced Computer Science and Applications
  • Wafa M Alshwaish + 1 more

This study was carried out by the New York State Department of Health, between 2012 and 2016. This experiment relates to six supervised machine learning methods: Support Vector Machine (SVM), Logistic Regression (LR), Gradient Boosting (GB), Random Forest (RF), Deep Learning (DL) and the Ensemble Model, all of which are used in the prediction of infant mortality. This experiment applied ensemble model that concentrated on assigning different weights to different models per output class in order to obtain a better predictive performance for infant mortality. Efforts were made to measure the performance and compare the classifier accuracy of each model. Several criteria, including the area under ROC curve, were considered when comparing the ensemble model (GB, RF and DL) with the other five models (SVM, LR, DL, GB and RF). In terms of these different criteria, the ensemble model outperformed the others in predicting survival rates among infant patients given a balanced data set (the areas under the ROC curve for minor, moderate, major and extreme were 98%, 95%, 92% and 97% respectively, giving a total accuracy of 80.65%). For the imbalanced dataset, (the areas under the ROC curve for minor, moderate, major and extreme were 98%, 98%, 99% and 99% respectively, giving total accuracy increased to 97.44%). The results of the experiments used in this dissertation showed that using the ensemble model provided a better level of prediction for infant mortality than the other five models, based on the relative prediction accuracy for each model for each output class. Therefore, the ensemble model provides and extremely promises classifier in terms of predicting infant mortality.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 66
  • 10.3390/ijgi10010042
Comparison of Ensemble Machine Learning Methods for Soil Erosion Pin Measurements
  • Jan 19, 2021
  • ISPRS International Journal of Geo-Information
  • Kieu Anh Nguyen + 3 more

Although machine learning has been extensively used in various fields, it has only recently been applied to soil erosion pin modeling. To improve upon previous methods of quantifying soil erosion based on erosion pin measurements, this study explored the possible application of ensemble machine learning algorithms to the Shihmen Reservoir watershed in northern Taiwan. Three categories of ensemble methods were considered in this study: (a) Bagging, (b) boosting, and (c) stacking. The bagging method in this study refers to bagged multivariate adaptive regression splines (bagged MARS) and random forest (RF), and the boosting method includes Cubist and gradient boosting machine (GBM). Finally, the stacking method is an ensemble method that uses a meta-model to combine the predictions of base models. This study used RF and GBM as the meta-models, decision tree, linear regression, artificial neural network, and support vector machine as the base models. The dataset used in this study was sampled using stratified random sampling to achieve a 70/30 split for the training and test data, and the process was repeated three times. The performance of six ensemble methods in three categories was analyzed based on the average of three attempts. It was found that GBM performed the best among the ensemble models with the lowest root-mean-square error (RMSE = 1.72 mm/year), the highest Nash-Sutcliffe efficiency (NSE = 0.54), and the highest index of agreement (d = 0.81). This result was confirmed by the spatial comparison of the absolute differences (errors) between model predictions and observations using GBM and RF in the study area. In summary, the results show that as a group, the bagging method and the boosting method performed equally well, and the stacking method was third for the erosion pin dataset considered in this study.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 57
  • 10.3390/diagnostics12123193
Ensemble Model for Diagnostic Classification of Alzheimer's Disease Based on Brain Anatomical Magnetic Resonance Imaging.
  • Dec 16, 2022
  • Diagnostics
  • Yusera Farooq Khan + 3 more

Alzheimer's is one of the fast-growing diseases among people worldwide leading to brain atrophy. Neuroimaging reveals extensive information about the brain's anatomy and enables the identification of diagnostic features. Artificial intelligence (AI) in neuroimaging has the potential to significantly enhance the treatment process for Alzheimer's disease (AD). The objective of this study is two-fold: (1) to compare existing Machine Learning (ML) algorithms for the classification of AD. (2) To propose an effective ensemble-based model for the same and to perform its comparative analysis. In this study, data from the Alzheimer's Diseases Neuroimaging Initiative (ADNI), an online repository, is utilized for experimentation consisting of 2125 neuroimages of Alzheimer's disease (n = 975), mild cognitive impairment (n = 538) and cognitive normal (n = 612). For classification, the framework incorporates a Decision Tree (DT), Random Forest (RF), Naïve Bayes (NB), and K-Nearest Neighbor (K-NN) followed by some variations of Support Vector Machine (SVM), such as SVM (RBF kernel), SVM (Polynomial Kernel), and SVM (Sigmoid kernel), as well as Gradient Boost (GB), Extreme Gradient Boosting (XGB) and Multi-layer Perceptron Neural Network (MLP-NN). Afterwards, an Ensemble Based Generic Kernel is presented where Master-Slave architecture is combined to attain better performance. The proposed model is an ensemble of Extreme Gradient Boosting, Decision Tree and SVM_Polynomial kernel (XGB + DT + SVM). At last, the proposed method is evaluated using cross-validation using statistical techniques along with other ML models. The presented ensemble model (XGB + DT + SVM) outperformed existing state-of-the-art algorithms with an accuracy of 89.77%. The efficiency of all the models was optimized using Grid-based tuning, and the results obtained after such process showed significant improvement. XGB + DT + SVM with optimized parameters outperformed all other models with an efficiency of 95.75%. The implication of the proposed ensemble-based learning approach clearly shows the best results compared to other ML models. This experimental comparative analysis improved understanding of the above-defined methods and enhanced their scope and significance in the early detection of Alzheimer's disease.

  • Supplementary Content
  • Cite Count Icon 231
  • 10.30773/pi.2018.12.21.2
Review of Machine Learning Algorithms for Diagnosing Mental Illness
  • Apr 1, 2019
  • Psychiatry Investigation
  • Gyeongcheol Cho + 4 more

ObjectiveEnhanced technology in computer and internet has driven scale and quality of data to be improved in various areas including healthcare sectors. Machine Learning (ML) has played a pivotal role in efficiently analyzing those big data, but a general misunderstanding of ML algorithms still exists in applying them (e.g., ML techniques can settle a problem of small sample size, or deep learning is the ML algorithm). This paper reviewed the research of diagnosing mental illness using ML algorithm and suggests how ML techniques can be employed and worked in practice.MethodsResearches about mental illness diagnostic using ML techniques were carefully reviewed. Five traditional ML algorithms-Support Vector Machines (SVM), Gradient Boosting Machine (GBM), Random Forest, Naïve Bayes, and K-Nearest Neighborhood (KNN)-frequently used for mental health area researches were systematically organized and summarized.ResultsBased on literature review, it turned out that Support Vector Machines (SVM), Gradient Boosting Machine (GBM), Random Forest, Naïve Bayes, and K-Nearest Neighborhood (KNN) were frequently employed in mental health area, but many researchers did not clarify the reason for using their ML algorithm though every ML algorithm has its own advantages. In addition, there were several studies to apply ML algorithms without fully understanding the data characteristics.ConclusionResearchers using ML algorithms should be aware of the properties of their ML algorithms and the limitation of the results they obtained under restricted data conditions. This paper provides useful information of the properties and limitation of each ML algorithm in the practice of mental health.

  • Research Article
  • Cite Count Icon 97
  • 10.1007/s11657-020-00802-8
Application of machine learning approaches for osteoporosis risk prediction in postmenopausal women.
  • Oct 23, 2020
  • Archives of Osteoporosis
  • Jae-Geum Shim + 6 more

Osteoporosis is a silent disease until it results in fragility fractures. However, early diagnosis of osteoporosis provides an opportunity to detect and prevent fractures. We aimed to develop machine learning approaches to achieve high predictive ability for osteoporosis risk that could help primary care providers identify which women are at increased risk of osteoporosis and should therefore undergo further testing with bone densitometry. We included all postmenopausal Korean women from the Korea National Health and Nutrition Examination Surveys (KNHANES V-1, V-2) conducted in 2010 and 2011. Machine learning models using methods such as the k-nearest neighbors (KNN), decision tree (DT), random forest (RF), gradient boosting machine (GBM), support vector machine (SVM), artificial neural networks (ANN), and logistic regression (LR) were developed to predict osteoporosis risk. We analyzed the effect of applying the machine learning algorithms to the raw data and featuring the selected data only where the statistically significant variables were included as model inputs. The accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUROC) were used to evaluate performance among the seven models. A total of 1792 patients were included in this study, of which 613 had osteoporosis. The raw data consisted of 19 variables and achieved performances (in terms of AUROCs) of 0.712, 0.684, 0.727, 0.652, 0.724, 0.741, and 0.726 for KNN, DT, RF, GBM, SVM, ANN, and LR with fivefold cross-validation, respectively. The feature selected data consisted of nine variables and achieved performances (in terms of AUROCs) of 0.713, 0.685, 0.734, 0.728, 0.728, 0.743, and 0.727 for KNN, DT, RF, GBM, SVM, ANN, and LR with fivefold cross-validation, respectively. In this study, we developed and compared seven machine learning models to accurately predict osteoporosis risk. The ANN model performed best when compared to the other models, having the highest AUROC value. Applying the ANN model in the clinical environment could help primary care providers stratify osteoporosis patients and improve the prevention, detection, and early treatment of osteoporosis.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant