Using Machine Learning to Detection Malware in IoHT System
The Internet of Health Things (IoHT) is a network of healthcare equipment, software, and systems that enable remote monitoring and healthcare services. Real-time health data is gathered via sensors. Even IoHT offers many benefits for modern smart healthcare, security concerns are increasing since IoHT devices lack appropriate processing power, storage capacity, and self-defense capabilities. In the healthcare sector, the use of machine learning (ML) for malware detection is vital for saving patients sensitive data. Therefore, it is essential to improve the accuracy and effectiveness of detection methods. ML models have been utilized to enhance the efficiency of malware detection. The main objectives of the attackers are to obtain personal information and take advantage of device flaws. Scientists are also devising diverse methods for identifying and analyzing malware to address these challenges. Given the continuous introduction of new malware by developers, it is highly tough to construct comprehensive algorithms for detecting such malware. Researchers have developed several ML and deep learning (DL) algorithms. The precision of these models will mainly contingent upon the amount of the training dataset. In addition, our work is divided into three primary stages: feature selection, prediction, and pre-processing. This work introduces feature selection technique that integrates two approaches, the first one Pearson correlation, to assess the correlation between features and identify significant features and Embedded method. These selected features are subsequently utilized in a classification model. Our method utilizes a soft voting classifier that combines multiple machine learning models (decision tree, logistic regression, gradient boost, random forest, and support vector machine) to detect malware. This approach creates a single model that incorporates the strengths of the combined models, resulting in the highest prediction accuracy. The proposed methodology surpasses previous research by reaching a 99.6 % accuracy rate, an F1 score of 0.9972 % a recall rate of 0.9998, a precision rate of 0.9947.
- Research Article
- 10.5455/jjcit.71-1736013097
- Jan 1, 2025
- Jordanian Journal of Computers and Information Technology
The Internet of Health Things (IoHT) is a network of healthcare devices, software, and systems that enable remote monitoring and healthcare services by gathering real-time health data through sensors. Despite its significant benefits for modern smart healthcare, IoHT faces growing security challenges due to the limited processing power, storage capacity, and self-defense capabilities of its devices. While blockchain-based authentication solutions have been developed to leverage tamper-resistant decentralized designs for enhanced security, they often require substantial computational resources, increased storage, and longer authentication times, hindering scalability and time efficiency in large-scale, time-critical IoHT systems. To address these challenges, we propose a novel four-phase authentication scheme comprising setup, registration, authentication, and secret construction phases. Our scheme integrates chaotic-based public key cryptosystems, a Light Encryption Device (LED) with a 3-D Lorenz chaotic map algorithm, and blockchain-based fog computing technologies to enhance both efficiency and scalability. Simulated on the Ethereum platform using Solidity and evaluated with the JMeter tool, the proposed scheme demonstrates superior performance, with a computational cost reduction of 40% compared to traditional methods like Elliptic Curve Cryptography (ECC). The average latency for registration is 1.25 ms, while the authentication phase completes in just 1.50 ms, making it highly suitable for time-critical IoHT applications. Security analysis using the Scyther tool confirms that the scheme is resistant to modern cyberattacks, including 51% attacks and hijacking, while ensuring data integrity and confidentiality. Additionally, the scheme minimizes communication costs and supports the scalability of large-scale IoHT systems. These results highlight the proposed scheme’s potential to revolutionize secure and efficient healthcare monitoring, enabling real-time, tamper-proof data management in IoHT environments.
- Research Article
2
- 10.1097/md.0000000000038513
- Jun 14, 2024
- Medicine
To explore the value of machine learning (ML) models based on contrast-enhanced cone-beam breast computed tomography (CE-CBBCT) radiomics features for the preoperative prediction of human epidermal growth factor receptor 2 (HER2)-low expression breast cancer (BC). Fifty-six patients with HER2-negative invasive BC who underwent preoperative CE-CBBCT were prospectively analyzed. Patients were randomly divided into training and validation cohorts at approximately 7:3. A total of 1046 quantitative radiomic features were extracted from CE-CBBCT images and normalized using z-scores. The Pearson correlation coefficient and recursive feature elimination were used to identify the optimal features. Six ML models were constructed based on the selected features: linear discriminant analysis (LDA), random forest (RF), support vector machine (SVM), logistic regression (LR), AdaBoost (AB), and decision tree (DT). To evaluate the performance of these models, receiver operating characteristic curves and area under the curve (AUC) were used. Seven features were selected as the optimal features for constructing the ML models. In the training cohort, the AUC values for SVM, LDA, RF, LR, AB, and DT were 0.984, 0.981, 1.000, 0.970, 1.000, and 1.000, respectively. In the validation cohort, the AUC values for the SVM, LDA, RF, LR, AB, and DT were 0.859, 0.880, 0.781, 0.880, 0.750, and 0.713, respectively. Among all ML models, the LDA and LR models demonstrated the best performance. The DeLong test showed that there were no significant differences among the receiver operating characteristic curves in all ML models in the training cohort (P > .05); however, in the validation cohort, the DeLong test showed that the differences between the AUCs of LDA and RF, AB, and DT were statistically significant (P = .037, .003, .046). The AUCs of LR and RF, AB, and DT were statistically significant (P = .023, .005, .030). Nevertheless, no statistically significant differences were observed when compared to the other ML models. ML models based on CE-CBBCT radiomics features achieved excellent performance in the preoperative prediction of HER2-low BC and could potentially serve as an effective tool to assist in precise and personalized targeted therapy.
- Research Article
22
- 10.3390/app13106138
- May 17, 2023
- Applied Sciences
A mortality prediction model can be a great tool to assist physicians in decision making in the intensive care unit (ICU) in order to ensure optimal allocation of ICU resources according to the patient’s health conditions. The entire world witnessed a severe ICU patient capacity crisis a few years ago during the COVID-19 pandemic. Various widely utilized machine learning (ML) models in this research field can provide poor performance due to a lack of proper feature selection. Despite the fact that nature-based algorithms in other sectors perform well for feature selection, no comparative study on the performance of nature-based algorithms in feature selection has been conducted in the ICU mortality prediction field. Therefore, in this research, a comparison of the performance of ML models with and without feature selection was performed. In addition, explainable artificial intelligence (AI) was used to examine the contribution of features to the decision-making process. Explainable AI focuses on establishing transparency and traceability for statistical black-box machine learning techniques. Explainable AI is essential in the medical industry to foster public confidence and trust in machine learning model predictions. Three nature-based algorithms, namely the flower pollination algorithm (FPA), particle swarm algorithm (PSO), and genetic algorithm (GA), were used in this study. For the classification job, the most widely used and diversified classifiers from the literature were used, including logistic regression (LR), decision tree (DT) classifier, the gradient boosting (GB) algorithm, and the random forest (RF) algorithm. The Medical Information Mart for Intensive Care III (MIMIC-III) dataset was used to collect data on heart failure patients. On the MIMIC-III dataset, it was discovered that feature selection significantly improved the performance of the described ML models. Without applying any feature selection process on the MIMIC-III heart failure patient dataset, the accuracy of the four mentioned ML models, namely LR, DT, RF, and GB was 69.9%, 82.5%, 90.6%, and 91.0%, respectively, whereas with feature selection in combination with the FPA, the accuracy increased to 71.6%, 84.8%, 92.8%, and 91.1%, respectively, for the same dataset. Again, the FPA showed the highest area under the receiver operating characteristic (AUROC) value of 83.0% with the RF algorithm among all other algorithms utilized in this study. Thus, it can be concluded that the use of feature selection with FPA has a profound impact on the outcome of ML models. Shapley additive explanation (SHAP) was used in this study to interpret the ML models. SHAP was used in this study because it offers mathematical assurances for the precision and consistency of explanations. It is trustworthy and suitable for both local and global explanations. It was found that the features that were selected by SHAP as most important were also most common with the features selected by the FPA. Therefore, we hope that this study will help physicians to predict ICU mortality for heart failure patients with a limited number of features and with high accuracy.
- Research Article
29
- 10.1109/access.2023.3255176
- Jan 1, 2023
- IEEE Access
In recent years, a significant amount of research has focused on analyzing the effectiveness of machine learning (ML) models for malware detection. These approaches have ranged from methods such as decision trees and clustering to more complex approaches like support vector machine (SVM) and deep neural networks. In particular, neural networks have proven to be very effective in detecting complex and advanced malware. This, however, comes with a caveat. Neural networks are notoriously complex. Therefore, the decisions that they make are often just accepted without questioning why the model made that specific decision. The black box characteristic of neural networks has challenged researchers to explore methods to explain black-box models such as SVM and neural networks and their decision-making process. Transparency and explainability give the experts and malware analysts assurance and trustworthiness about the ML models’ decisions. In addition, it helps in generating comprehensive reports that can be used to enhance cyber threat intelligence sharing. As such, this much needed analysis drives our work in this paper to explore the explainability and interpretability of ML models in the field of online malware detection. In this paper, we used the Shapley Additive exPlanations (SHAP) explainability technique to achieve efficient performance in interpreting the outcome of different ML models such as SVM Linear, SVM-RBF (Radial Basis Function), Random Forest (RF), Feed-Forward Neural Net (FFNN), and Convolutional Neural Network (CNN) models trained on an online malware dataset. To explain the output of these models, explainability techniques such as KernalSHAP, TreeSHAP, and DeepSHAP are applied to the obtained results.
- Research Article
- 10.1111/nicc.70190
- Oct 9, 2025
- Nursing in critical care
Predicting mortality among trauma patients is a critical task that can guide clinical decision-making, better management and resource allocation in Intensive Care Units (ICU). Machine learning has been increasingly employed in clinical practice to effectively predict the mortality of critically ill patients. This study aimed to develop and evaluate the machine learning models for predicting the mortality of trauma patients. A multicentre cross-sectional study was conducted. The data were collected retrospectively from 613 trauma patients admitted to the ICU between January 1, 2020, and December 30, 2021, in comprehensive specialised hospitals of Northwest Ethiopia. The Kampala trauma score (KTS II) and revised trauma score (RTS) were calculated for each patient on admission, and the scores range from 5 to 10 and 0 to 7.84, respectively, with lower scores indicating more severe trauma and a higher risk of mortality. Pre-processing, feature selection and model fitting were done using Python version 3.12. Seven Machine Learning (ML) models, Decision Tree (DT), Random Forest (RF), Naive Bayes (NB), K-Nearest Neighbours (KNN), Logistic Regression (LR), Support Vector Machine (SVM) and Extreme Gradient Boosting (XGBoost), were developed to predict mortality among trauma patients at the time of hospital discharge. The dataset was divided into training (80%) and testing sets (20%), and a 10-fold cross-validation technique was employed to improve model performance. The models' prediction accuracy was measured using metrics derived from the confusion matrix, such as sensitivity, specificity, precision and Receiver Operating Characteristics (ROC). Of 613 trauma patients admitted to the intensive care units, 248 (40.5%) died. Among the different variables included, the Kampala trauma score (KTS II), Glasgow Coma Scale (GCS) score and presence of complications were the most reliable features for predicting mortality among trauma patients. This study found that the Random Forest (RF) algorithm outperformed other machine learning algorithms, achieving an accuracy of 95%, sensitivity of 96%, precision of 93%, F1 score of 94% and a Receiver Operating Characteristics (ROC) score of 99%. Moreover, Support Vector Machines (SVM) and XGBoost also performed exceptionally well, with AUC scores of 0.98 and 0.97, respectively. We found the Random Forest (RF) to be the best-performing machine learning model in predicting the mortality among trauma patients. This is the first machine learning model developed specifically for mortality prediction of trauma patients in Ethiopia. The application of machine learning algorithms is warranted to stratify the risk of mortality, enabling evidence-based intervention and maximising resource utilisation. Thus, further external validation on independent data from prospective studies is needed to evaluate the universal applicability of the model to clinical practice. This study holds substantial value for clinical practice by enhancing decision support, enabling early identification of high-risk patients and supporting proactive surveillance and timely interventions. The integration of machine learning for mortality prediction is particularly impactful, as it facilitates remote monitoring and telemedicine, helping to bridge gaps in healthcare access. Additionally, it aids in optimising treatment strategies through patient-centred data, informs health planning and resource allocation, supports personalised care and advances data-driven research and policy-making.
- Preprint Article
- 10.5194/ems2025-562
- Jul 16, 2025
Machine learning (ML) and deep learning (DL) models can play an important role when it comes to modelling complicated processes. Such capability is necessary for hydrological and climate-related applications. Generally, ML models utilize precipitation and temperature time series of a basin as input to develop a lumped rainfall-runoff model to simulate streamflow at the basin outlet. However, when it is divided into several sub-basins, Graph Neural Networks (GNN) can consider each sub-basin as a node and link them together using a connectivity matrix to account for spatial variations of hydroclimatic variables. In this study, GNN and various ML models with different types of architecture, ranging from neural networks, tree-based structure, and gradient boosting, were exploited for daily streamflow simulation over different case studies. For each case study, the basin was divided into a few sub-basins for which daily precipitation and temperature data were aggregated and used as input. For training GNN, the connection matrix of sub-basins was also used as input. Basically, 75% of historical records were utilized to train GNN and different ML models, e.g., artificial neural networks, support vector machine, decision tree, random forest, eXtreme Gradient Boosting (XGBoost), Light Gradient-Boosting Machine (LightGBM), and Category Boosting (CatBoost), while the rest was used for testing. Streamflow simulation was conducted with/without considering seasonality impact and lag times. The obtained results clearly demonstrate that considering seasonality and time lags can enhance accuracy of streamflow predictions based on Kling–Gupta efficiency (KGE). Furthermore, GNN with seasonality impact and time lags achieved promising results across different case studies with KGE>0.85 for training and KGE>0.59 for testing data, respectively. Among ML models, boosting models, e.g., LightGBM and XGBoost, performed slightly better than other ML models. for Finally, this comparative analysis provides valuable insights for ML/DL applications in climate change impact assessments.Acknowledgements: This research work was carried out as part of the TRANSCEND project with funding received from the European Union Horizon Europe Research and Innovation Programme under Grant Agreement No. 10108411.
- Research Article
- 10.55859/ijiss.1510423
- Dec 29, 2024
- International Journal of Information Security Science
The rapid evolution of malware presents significant challenges in cybersecurity. This study investigates the efficacy of various machine learning and ensemble learning models for malware detection using dynamic analysis. The dynamic datasets, contain API calls and permissions, enabling real-time monitoring of malware behavior. In conclusion, for both the VirusSample and VirusShare datasets, the random forest (RF) model achieved the best results among machine learning models, with accuracies of %94.69 and %85.72, respectively. For the VirusSample dataset, the stacking ensemble learning model, which uses RF and decision trees (DT) as base classifiers and K-nearest neighbors (KNN) as the meta classifier, achieved the highest accuracy of %94.52. In contrast, for the VirusShare dataset, the stacking ensemble learning model, which uses RF, KNN, and gradient boosting (GB) as base classifiers and support vector machine (SVM) as the meta classifier, achieved the highest accuracy of %85.7. These results underscore the superiority of dynamic analysis and the effectiveness of ensemble methods in enhancing malware detection accuracy. This study contributes to the optimization of machine learning models and the advancement of cybersecurity solutions.
- Research Article
23
- 10.3390/s23010240
- Dec 26, 2022
- Sensors
The Internet of Health Things (IoHT) has emerged as an attractive networking paradigm in wireless communications, integrated devices and embedded system technologies. In the IoHT, real-time health data are collected through smart healthcare sensors and, in recent years, the IoHT has started to have an important role in the Internet of Things technology. Although the IoHT provides comfort in health monitoring, it also imposes security challenges in maintaining patient data confidentiality and privacy. To overcome such security issues, in this paper, a novel blockchain-based privacy-preserving authentication scheme is proposed as an approach for achieving efficient authentication of the patient without the involvement of a trusted entity. Moreover, a secure handover authentication mechanism that ensures avoiding the patient re-authentication in multi-doctor communication scenarios and revoking the possible malicious misbehavior of medical professionals in the IoHT communication with the patient is developed. The performance of the proposed authentication and handover scheme is analyzed concerning the existing state-of-the-art authentication schemes. The results of the performance analyses reveal that the proposed authentication scheme is resistant to different types of security attacks. Moreover, the results of analyses show that the proposed authentication scheme outperforms similar state-of-the-art authentication schemes in terms of having lower computational, communication and storage costs. Therefore, the novel authentication and handover scheme has proven practical applicability and represents a valuable contribution to improving the security of communication in IoHT networks.
- Research Article
3
- 10.1186/s12889-025-21658-y
- Feb 4, 2025
- BMC Public Health
BackgroundAlveolar bone loss (ABL) is common in modern society. Heavy metal exposure is usually considered to be a risk factor for ABL. Some studies revealed a positive trend found between urinary heavy metals and periodontitis using multiple logistic regression and Bayesian kernel machine regression. Overfitting using kernel function, long calculation period, the definition of prior distribution and lack of rank of heavy metal will affect the performance of the statistical model. Optimal model on this topic still remains controversy. This study aimed: (1) to develop an algorithm for exploring the association between heavy metal exposure and ABL; (2) filter the actual causal variables and investigate how heavy metals were associated with ABL; and (3) identify the potential risk factors for ABL.MethodsData were collected from National Health and Nutrition Examination Survey (NHANES) between 2015 and 2018 to develop a machine learning (ML) model. Feature selection was performed using the Least Absolute Shrinkage and Selection Operator (LASSO) regression with 10-fold cross-validation. The selected data were balanced using the Synthetic Minority Oversampling Technique (SMOTE) and divided into a training set and testing set at a 3:1 ratio. Logistic Regression (LR), Support Vector Machines (SVM), Random Forest (RF), K-Nearest Neighbor (KNN), Decision Tree (DT), and XGboost were used to construct the ML model. Accuracy, Area Under the Receiver Operating Characteristic Curve (AUC), Precision, Recall, and F1 score were used to select the optimal model for further analysis. The contribution of the variables to the ML model was explained using the Shapley Additive Explanations (SHAP) method.ResultsRF showed the best performance in exploring the association between heavy metal exposure and ABL, with an AUC (0.88), accuracy (0.78), precision (0.76), recall (0.83), and F1 score (0.79). Age was the most important factor in the ML model (mean| SHAP value| = 0.09), and Cd was the primary contributor. Sex had little effect on the ML model contribution.ConclusionIn this study, RF showed superior performance compared with the other five algorithms. Among the 12 heavy metals, Cd was the most important factor in the ML model. The relationship of Co & Pb and ABL are weaker than that of Cd. Among all the independent variables, age was considered the most important factor for this model. As for PIR, low-income participants present association with ABL. Mexican American and Non-Hispanic White show low association with ABL compared to Non-Hispanic Black and other races. Gender feature demonstrates a weak association with ABL. In the future, more advanced algorithms should be developed to validate these results and related parameters can be tuned to improve the accuracy of the model.Clinical trial numbernot applicable.
- Research Article
- 10.3390/agriengineering7120424
- Dec 10, 2025
- AgriEngineering
High-quality soybean seeds possess genetic, physical, and physiological characteristics that directly influence crop yield. The use of hyperspectral sensors combined with machine learning (ML) can streamline and accelerate seed germination testing. Therefore, the objectives of this study were: (i) to evaluate whether leaf and seed reflectance can effectively predict the physiological quality of soybean seeds using ML algorithms, and (ii) to identify which algorithm provides the highest prediction accuracy. Thirty-two soybean genotypes were evaluated in a controlled experiment. Leaves and seeds were analyzed using a hyperspectral sensor capable of measuring reflectance across the 350 to 2500 nm range. The resulting data were subjected to ML analysis with two types of input: spectral variables from leaves and seeds. The output variables predicted included germination test (GERM), electrical conductivity (EC), first germination count (FGC), vigorous tetrazolium test (VIG-TZ), and viable tetrazolium test (VIAB). Predictions were performed using stratified 10-fold cross-validation with ten repetitions (100 runs per model). All model parameters were set to the default configuration in Weka version 3.8.5. The ML models used for prediction included artificial neural networks (ANN), REPTree and M5P decision trees, random forest (RF), support vector machine (SVM), and ZeroR, with the latter serving as a control algorithm. The models showed consistent performance in predicting physiological variations in seeds, with better results when seed reflectance was used as input. For germination (GERM), the M5P, RF, and SVM algorithms obtained the highest correlations (r = 0.565–0.575). In predicting electrical conductivity (EC), M5P showed greater accuracy with leaf data (r = 0.506), while SVM performed best with seed data (r = 0.658). For first germination count (CPG), M5P was the most accurate with leaf data (r = 0.720), while M5P, RF, and SVM showed r between approximately 0.735 and 0.777 with seed data. In tetrazolium vigor (TZVG), RF showed the best performance (MAE 0.25), again highlighting seed reflection, which resulted in the lowest errors and highest correlations. Overall, the M5P, RF, and SVM algorithms achieved the most robust results, especially when used with seed spectral data. The highest germination prediction accuracy was achieved by the M5P, SVM, and RF algorithms for both input types. Seed reflectance yielded the best accuracy and the lowest MAE and RMSE values. Leaf reflectance also enabled accurate predictions, indicating that this input can serve as an early, in-field strategy for predicting soybean seed physiological quality.
- Conference Article
1
- 10.5753/sbcas_estendido.2024.2349
- Jun 25, 2024
Monitoring people’s Quality of Life (QoL) has attracted interest due to the health benefits of an accurate QoL analysis, such as early healthcare interventions. However, most instruments to assess QoL are questionnaires, and their application is time-consuming, intrusive, and error-prone. This work proposes an Internet of Health Things (IoHT) platform called Healful that applies Machine Learning to infer users’ QoL. A case study with 44 participants was conducted for six months, and during this evaluation, health data were collected daily through smartphones and wearables. These data were processed and compiled into two datasets with 1,373 instances each. Next, five Machine Learning models were built using 10-fold cross-validation to estimate participants’ QoL. Random Forest (RF) had the best results considering the Root Mean Squared Error (RMSE). RF got an RMSE of 7.8618 for the physical domain and 7.4591 for the psychological domain.
- Research Article
33
- 10.12989/gae.2021.25.1.001
- Jan 1, 2021
- Geomechanics and Engineering
Machine learning models have been widely used for landslide susceptibility assessment (LSA) in recent years. The large number of inputs or conditioning factors for these models, however, can reduce the computation efficiency and increase the difficulty in collecting data. Feature selection is a good tool to address this problem by selecting the most important features among all factors to reduce the size of the input variables. However, two important questions need to be solved: (1) how do feature selection methods affect the performance of machine learning models? and (2) which feature selection method is the most suitable for a given machine learning model? This paper aims to address these two questions by comparing the predictive performance of 13 feature selection-based machine learning (FS-ML) models and 5 ordinary machine learning models on LSA. First, five commonly used machine learning models (i.e., logistic regression, support vector machine, artificial neural network, Gaussian process and random forest) and six typical feature selection methods in the literature are adopted to constitute the proposed models. Then, fifteen conditioning factors are chosen as input variables and 1,017 landslides are used as recorded data. Next, feature selection methods are used to obtain the importance of the conditioning factors to create feature subsets, based on which 13 FS-ML models are constructed. For each of the machine learning models, a best optimized FS-ML model is selected according to the area under curve value. Finally, five optimal FS-ML models are obtained and applied to the LSA of the studied area. The predictive abilities of the FS-ML models on LSA are verified and compared through the receive operating characteristic curve and statistical indicators such as sensitivity, specificity and accuracy. The results showed that different feature selection methods have different effects on the performance of LSA machine learning models. FS-ML models generally outperform the ordinary machine learning models. The best FS-ML model is the recursive feature elimination (RFE) optimized RF, and RFE is an optimal method for feature selection.
- Research Article
1
- 10.5812/iranjradiol-147913
- Jul 30, 2024
- Iranian Journal of Radiology
Background: Osteoporosis is a systemic skeletal disorder marked by reduced bone density and microarchitectural deterioration, leading to increased fracture risk. While the dual-energy X-ray absorptiometry (DEXA) scan is the World Health Organization (WHO)-recommended diagnostic standard, its limitations necessitate alternative methods. Emerging magnetic resonance imaging (MRI) techniques, radiomics, and machine learning promise to enhance osteoporosis diagnosis through detailed analysis of lumbar MRI apparent diffusion coefficient (ADC) maps, potentially revolutionizing early detection and treatment strategies. Objectives: In this study, we are going to evaluate the performance of machine learning (ML) models using radiomics features of lumbar MRI ADC map for osteoporosis detection, and to identify significant features and their diagnostic thresholds. Specific performance metrics such as accuracy, sensitivity, specificity, and Area Under the receiver operating characteristic (ROC) Curve (AUC) were assessed. Patients and Methods: This retrospective study employed a cross-sectional design, with a total of 140 cases, including 21 with osteoporosis. The study's inclusion criteria consisted of concurrent lumbar MRI and DEXA within a year, while exclusion criteria included infectious or neoplastic lumbar lesions, fractures, instrumentation, significant osteodegenerative changes, cases where the first four lumbar vertebrae were not included in the imaging field, and absence of diffusion-weighted imaging. Manual segmentation of lumbar vertebrae from ADC maps was performed to create a comprehensive dataset, comprising 5,580 radiomics features per case. Subsequently, the top five features selected by fast correlation-based filter (FCBF) were used to test the performance of seven Machine Learning algorithms (k-Nearest neighbors, decision tree, random forest, logistic regression, support vector machine, naive bayes, and neural network). Statistical tests and ROC curve analysis were conducted to determine the significance and thresholds of these features. Results: The study included 140 cases, with 132 females (94.3%) and 8 males (5.7%), and a mean age of 65.32 ± 8.50 years. The mean BMI was 31.43 ± 5.53 kg/m² for females and 26 ± 3.59 kg/m² for males. In terms of demographic differences, no significant age difference was found between the osteoporotic and non-osteoporotic groups (P = 0.889). However, the osteoporotic group had significantly lower mean body weight (64.90 ± 10.13 kg vs. 74.68 ± 13.94 kg, P = 0.003) and BMI (27.40 ± 4.38 kg/m² vs. 31.77 ± 5.52 kg/m², P = 0.001) compared to the non-osteoporotic group. The median interval between DEXA and lumbar MRI was 1 month (range 0.1 - 11.87 months). The Neural Network model demonstrated the highest performance with an AUC of 0.616 and a classification accuracy of 0.764 using all features. The Naive Bayes model, using the top five features selected by FCBF, showed the highest performance with an AUC of 0.913, accuracy of 0.907, sensitivity of 0.667, and specificity of 0.95. All ML models’ performance were elevated by feature selection. Independent t-tests and Mann-Whitney U tests identified 521 and 670 significant features, respectively (P < 0.05). ROC analysis revealed 58 features with AUC values above 0.70. Conclusion: This study's findings suggest that ML models, particularly the Naive Bayes algorithm, can effectively use lumbar ADC map radiomics to diagnose osteoporosis. These findings could enhance early detection and treatment strategies, potentially improving patient outcomes and reducing the burden of osteoporotic fractures. This study also established threshold values for significant features.
- Research Article
6
- 10.3389/fpsyg.2024.1447968
- Oct 29, 2024
- Frontiers in psychology
A promising approach to optimizing recovery in youth football has been the use of machine learning (ML) models to predict recovery states and prevent mental fatigue. This research investigates the application of ML models in classifying male young football players aged under (U)15, U17, and U19 according to their recovery state. Weekly training load data were systematically monitored across three age groups throughout the initial month of the 2019-2020 competitive season, covering 18 training sessions and 120 observation instances. Outfield players were tracked using portable 18-Hz global positioning system (GPS) devices, while heart rate (HR) was measured using 1 Hz telemetry HR bands. The rating of perceived exertion (RPE 6-20) and total quality recovery (TQR 6-20) scores were employed to evaluate perceived exertion, internal training load, and recovery state, respectively. Data preprocessing involved handling missing values, normalization, and feature selection using correlation coefficients and a random forest (RF) classifier. Five ML algorithms [K-nearest neighbors (KNN), extreme gradient boosting (XGBoost), support vector machine (SVM), RF, and decision tree (DT)] were assessed for classification performance. The K-fold method was employed to cross-validate the ML outputs. A high accuracy for this ML classification model (73-100%) was verified. The feature selection highlighted critical variables, and we implemented the ML algorithms considering a panel of 9 variables (U15, U19, body mass, accelerations, decelerations, training weeks, sprint distance, and RPE). These features were included according to their percentage of importance (3-18%). The results were cross-validated with good accuracy across 5-fold (79%). The five ML models, in combination with weekly data, demonstrated the efficacy of wearable device-collected features as an efficient combination in predicting football players' recovery states.
- Preprint Article
- 10.21203/rs.3.rs-6910943/v1
- Jun 26, 2025
Self-consolidating concrete, SCC, is a high flowability and non-segregating material that provides great benefits in complex construction applications. The prediction of yield stress and plastic viscosity of SCC is critical for ensuring its performance and quality during mixing, transportation, and placement. Conventional methods to evaluate rheological properties are long-time-consuming, expensive, and prone to human errors. The current work aims at exploring the applicability of m The ML models developed in this study, such as Decision Trees, Support Vector Machines, Random Forests, Gene Expression Program and Deep Neural Networks, were trained and validated by using a dataset containing mix design parameters and experimental measurements of rheological properties. In addition, feature selection techniques were used to determine critical influencing factors like water-to-cement ratio, aggregate composition, and admixture dosage. Performance of such models was checked with respect to Mean Absolute Error, Root Mean Square Error, and R² scores chine learning (ML) models for the accurate and efficient prediction of the rheological properties of Self-consolidating concrete (SCC) is a non-segregating, highly flowable concrete that improves construction efficiency, especially in complex shapes and high reinforcement density areas. Precise prediction of its rheological properties, i.e., yield stress and plastic viscosity, is critical to ensure quality during mixing, transportation, and placing operations. Traditional testing procedures are labor intensive, expensive, and prone to human errors. This research explores the use of machine learning (ML) models, i.e., Gene Expression Programming (GEP), Deep Neural Networks (DNN), Decision Trees (DT), Support Vector Machines (SVM), and Random Forests (RF), for accurate SCC rheological property prediction. Model performance was evaluated in terms of Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and R². The GEP and DNN models were determined to be better, with R² values of 0.93 and 0.89 for V-funnel time, and 0.81 for slump flow prediction. To provide insights into model predictions and explore the contribution of influential mix design factors, SHAP and PDP analyses were performed. The results validate that ML models, i.e., GEP and DNN, can accurately predict SCC rheological properties, thus eliminating extensive experimental testing and providing useful insights for optimal mix design.