User Clustering with Spatial Concept using Supervised Learning for NOMA Downlink
This research aims to optimize the performance of Successive Interference Cancellation (SIC) in Power Domain Non-Orthogonal Multiple Access (PD-NOMA) technology by applying spatial concepts through the use of beamforming techniques. User clustering is a key element in achieving this goal, and this research applies various supervised machine learning classification algorithms including Decision Tree, K-Nearest Neighbors (K-NN), Support Vector Machine (SVM), Random Forest, Logistic Regression, and Naive Bayes. The experimental results show that Random Forest achieves the highest accuracy in classifying users, followed by Decision Tree. In addition, in measuring performance using ROC (Receiver Operating Characteristic) and AUC (Area Under the Curve) curves, Decision Tree and Random Forest achieved the best results as well. While in terms of experimentation process time, decision tree has a faster time than random forest. Overall, Random Forest and Decision Tree algorithms are suitable for the task of user clustering in the context of PD-NOMA which utilizes the spatial concept of user to Base Station (BS).
- Research Article
13
- 10.1186/s12880-023-01106-2
- Oct 16, 2023
- BMC Medical Imaging
BackgroundThere is a paucity of research investigating the application of machine learning techniques for distinguishing between lipid-poor adrenal adenoma (LPA) and subclinical pheochromocytoma (sPHEO) based on radiomic features extracted from non-contrast and dynamic contrast-enhanced computed tomography (CT) scans of the abdomen.MethodsWe conducted a retrospective analysis of multiphase spiral CT scans, including non-contrast, arterial, venous, and delayed phases, as well as thin- and thick-thickness images from 134 patients with surgically and pathologically confirmed. A total of 52 patients with LPA and 44 patients with sPHEO were randomly assigned to training/testing sets in a 7:3 ratio. Additionally, a validation set was comprised of 22 LPA cases and 16 sPHEO cases from two other hospitals. We used 3D Slicer and PyRadiomics to segment tumors and extract radiomic features, respectively. We then applied T-test and least absolute shrinkage and selection operator (LASSO) to select features. Six binary classifiers, including K-nearest neighbor (KNN), logistic regression (LR), decision tree (DT), random forest (RF), support vector machine (SVM), and multi-layer perceptron (MLP), were employed to differentiate LPA from sPHEO. Receiver operating characteristic (ROC) curves and area under the curve (AUC) values were compared using DeLong’s method.ResultsAll six classifiers showed good diagnostic performance for each phase and slice thickness, as well as for the entire CT data, with AUC values ranging from 0.706 to 1. Non-contrast CT densities of LPA were significantly lower than those of sPHEO (P < 0.001). However, using the optimal threshold for non-contrast CT density, sensitivity was only 0.743, specificity 0.744, and AUC 0.828. Delayed phase CT density yielded a sensitivity of 0.971, specificity of 0.641, and AUC of 0.814. In radiomics, AUC values for the testing set using non-contrast CT images were: KNN 0.919, LR 0.979, DT 0.835, RF 0.967, SVM 0.979, and MLP 0.981. In the validation set, AUC values were: KNN 0.891, LR 0.974, DT 0.891, RF 0.964, SVM 0.949, and MLP 0.979.ConclusionsThe machine learning model based on CT radiomics can accurately differentiate LPA from sPHEO, even using non-contrast CT data alone, making contrast-enhanced CT unnecessary for diagnosing LPA and sPHEO.
- Research Article
- 10.3389/fmed.2025.1516476
- Nov 25, 2025
- Frontiers in Medicine
BackgroundThis study aimed to investigate the effect of gastrointestinal bleeding (GIB) on the short-term survival of hepatitis B virus-related acute-on-chronic liver failure (HBV-ACLF) patients, establish a prediction model for HBV-ACLF-related GIB via machine learning (ML) algorithms, and compare the predictive ability of various models.MethodsA total of 583 HBV-ACLF patients from two medical centers were retrospectively enrolled, and patients from one of the centers were randomly divided into a training cohort (n = 360) and a test cohort (n = 153) at a 7:3 ratio. Patients from the other center composed the validation cohort (n = 70). Patients were divided into GIB and non-gastrointestinal bleeding (NGIB) groups according to whether they had GIB during hospitalization, and short-term survival rates were compared between the two groups. Least absolute shrinkage and selection operator (LASSO) regression was used to screen for features associated with GIB. On the basis of the screened features, we used five ML algorithms, namely, logistic regression (LR), support vector machine (SVM), decision tree (DT), random forest (RF), and K-nearest neighbors (KNN), to build a prediction model for GIB. Six metrics, namely, accuracy, area under the curve (AUC), sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), were used to evaluate the predictive ability of these models.ResultsIn the training cohort, patients in the GIB group had significantly lower 30- and 90-day survival rates than did those in the NGIB group (48.72% versus 85.67% and 10.26% versus 64.80%, respectively), and similar results were obtained in the test cohort and the validation cohort. LASSO regression screened seven features associated with GIB, of which portal hypertension, electrolyte disturbance, and white blood cell counts were modeled features common to the five machine prediction models. The AUCs of the LR, SVM, DT, RF, and KNN models in the training cohort were 0.819, 0.924, 0.661, 1.000, and 0.865, respectively. Compared with the other four models, the LR model had the lowest PPV of 0.202 in the test cohort, the SVM model had the lowest AUC and sensitivity of 0.657 and 0.500 in the validation cohort, the DT model had the lowest sensitivity of 0.436 and 0.438 in the training and test cohorts, respectively, and the KNN model had the lowest PPV of 0.250 in the validation cohort. Notably, the RF model had the least fluctuations in accuracy, AUC, sensitivity, specificity, PPV, and NPV among the 3 cohorts, with good overall predictive ability.ConclusionGIB has a significant effect on short-term survival in patients with HBV-ACLF. On this basis, five ML prediction models, LR, SVM, DT, RF, and KNN, were established to have better prediction ability for GIB, among which the RF model has the most robust prediction performance, which can help clinicians intervene in advance and improve the short-term survival rate of patients.
- Research Article
51
- 10.1016/j.imed.2021.12.003
- Jan 10, 2022
- Intelligent Medicine
Applying data mining techniques to classify patients with suspected hepatitis C virus infection
- Research Article
3
- 10.21037/qims-24-1914
- Feb 1, 2025
- Quantitative imaging in medicine and surgery
The course of patients with type B aortic intramural hematoma (IMH) is unstable, and different studies have shown that the evolution of this type of IMH is highly heterogeneous. This study sought to explore the value of radiomics in predicting the prognosis of type B aortic IMH, and to develop and validate a prediction model of type B aortic IMH progression. A total of 119 patients with type B aortic IMH who had not undergone surgical or thoracic endovascular aortic repair treatment were enrolled in this study. These patients were divided into the progressive group (n=61) and stable group (n=58) based on re-examination aortic computed tomography angiography (CTA) imaging. The patients were then randomly divided into the training cohort (n=95) and the validation cohort (n=24). The uAI Research Portal (URP) was used to perform the radiomics feature extraction of the intensity, shape, texture, and gradient features. Next, the least absolute shrinkage and selection operator (LASSO) logistic regression (LR) method was used for feature selection, and prediction models were constructed based on clinical features, CTA imaging features, and radiomic features. Different machine-learning algorithms were used to build the models, including random forest (RF), support vector machine (SVM), LR, K-nearest neighbor (KNN), decision tree, and stochastic gradient descent (SGD) algorithms. The area under the curve (AUC) of the receiver operating characteristic (ROC) curve, sensitivity, specificity, accuracy, and F1 score were used to evaluate the efficacy of the prediction models. After the application of the LASSO method, 12 radiomic features were selected from an initial pool of 1,004 radiomic features, 12 features were selected from the 21 clinical features, and 11 features were selected from the 15 CTA imaging features. Five predictive models were then constructed using distinct combinations of feature sets. For the test set, the AUC of the SVM algorithm in the radiomics model was the highest (0.833), that of the KNN algorithm in the clinical model was the highest (0.701), that of the RF algorithm in the CTA imaging model was the highest (0.806), and those of the LR and SGD algorithms in the clinical + CTA imaging model were the highest (both 0.792). The combined radiomics + clinical + CTA model had the highest AUC value (0.917), which was higher than that of the single radiomics model (0.833), CTA model (0.806), clinical + CTA model (0.792), and clinical model (0.701). The sensitivity, specificity, accuracy, precision and F1 scores of the combined radiomics + clinical + CTA model were all >0.75. The comprehensive model that incorporated clinical, CTA imaging, and radiomic features performed the best and accurately predicted the progression of type B aortic IMH. This model could help clinicians make optimal treatment decisions.
- Research Article
17
- 10.1007/s00062-021-01040-2
- Jun 22, 2021
- Clinical Neuroradiology
The objective of this study was to predict hematoma expansion (HE) by radiomic models based on different machine learning methods and determine the best radiomic model through the comparison. A total of 108 patients with intracerebral hemorrhage were retrospectively evaluated. Images of baseline non-contrast computed tomography (NCCT) and follow-up NCCT scan within 24 h were retrospectively reviewed. An HE was defined as avolume increase of more than 33% or an increase greater than 12.5 mL from the volume of the baseline NCCT. Texture parameters of the baseline NCCT images were selected by the least absolute shrinkage and selection operator (LASSO) regression. We used support vector machine (SVM), decision tree (DT), conditional inference trees (CIT), random forest (RF), k‑nearest neighbors (KNN), back-propagation neural network (BPNet) and Bayes to build models. Receiver operating characteristic (ROC) analysis and decision curve analysis (DCA) was performed and compared among models. Every model had arelatively high AUC (all > 0.75), SVM and KNN had the highest AUC of 0.91. There were significant differences between SVM and CIT (Z > 2.266, p = 0.02345), KNN and CIT (Z = 2.4834, p = 0.01301), RF and CIT (Z = 2.6956, p = 0.007027), KNN and BPNet (Z = 2.0122, p = 0.0442), RF and BPNet (Z = 1.9793, p = 0.04778). There was no significant difference among SVM, DT, RF, KNN and Bayes (p> 0.05). The SVM obtained the largest net benefit when the threshold probability was less than 0.33, while KNN obtained the largest net benefit when the threshold probability was greater than 0.33. Combined with ROC and DCA, SVM and KNN performed better in all the models for predicting HE. Radiomic models based on different machine learning methods can be used to predict HE and the models generated by SVM and KNN performed best.
- Research Article
11
- 10.1007/s11356-023-28730-3
- Jul 13, 2023
- Environmental Science and Pollution Research
Landslides are a common natural disaster, having severe socio-economic effects and posing immense threat to safety, such as loss of life at a global scale. Modeling and predicting the possibility of landslides are important in order to monitor and prevent their negative consequences. In this study, landslides are the primary research object. Further, the frequency ratio (FR) method was applied to the random forest (RF), support vector machine (SVM), and decision tree (DT) regression algorithms for landslide sensitivity assessment. It was also applied to landslide risk assessment mapping in the Longmen Mountain area. Therefore, taking into account the positive and negative sample balance, 7774 historical landslide points and 7774 non-landslide points were selected and divided them into training sets and test sets. The influence factors were selected and analyzed through multicollinearity analysis and the FR method. To improve the performance of the model and the accuracy of the findings, the individual environmental factors are normalized. Subsequently, the LSI (landslide susceptibility index), was obtained by calculating the frequency ratio. Following this, the RF, SVM, and DT were used to construct the model. The trained model calculates the landslide probability of each cell in the study area and generates the resultant susceptibility map. The receiver operating characteristic (ROC) curve and R2 of this region were calculated to evaluate the model's performance. The results indicate that RF obtained the highest predictive performance (area under the curve (AUC) = 0.82) in landslide risk prediction, followed by SVM (AUC = 0.8) and DT (AUC = 0.69). The results of this study serve as a predictive map for landslide susceptibility areas and provide critical support for the security of lives and property for the human and socio-economic development in the Longmen Mountain region. In addition, the experiment results reveal that the machine learning model based on the FR method can improve the accuracy and performance of methods in studies related to landslide susceptibility. The method is equally applicable to research in other fields.
- Research Article
13
- 10.1002/esp.5888
- Jun 19, 2024
- Earth Surface Processes and Landforms
This research introduces an innovative approach by utilising rock glaciers (RGs) as a proxy for mapping debris‐covered glaciers (DCGs). This approach focuses on the interconnected nature of glaciers, DCGs and RGs in a continuum where DCGs can transform into RGs over time due to various processes. This study utilises six machine learning models—logistic regression (LR), support vector machine (SVM), K‐nearest neighbour (KNN), Naïve Bayes (NB), decision tree (DT) and random forest (RF)—combined with multispectral satellite data (Sentinel‐2 and Landsat 8) and topographical data derived from ALOS PALSAR DEM. Performance metrics such as accuracy, area under the curve (AUC) score, precision, recall and F1‐score were evaluated to assess model performance. This detailed mapping provides a precise estimation of the extent of DCGs in the Kinnaur district. The estimated DCG areas revealed intriguing variation across models, with RF (9.71%), KNN (9.67%) and NB (9.41%) yielding similar predictions. SVM (11.61%) projected a slightly larger DCG area, whereas DT (5.54%) and LR (25.55%) provided contrasting results. Validation against high‐resolution satellite images, Google Earth images and glacier inventories confirmed the accuracy and reliability of our approach. Based on our findings for our specific study, the most effective method for mapping DCGs is RF, followed by KNN, NB, DT and SVM. The combination of machine learning models and RG data presents a novel and promising approach to remote sensing‐based DCG mapping, with potential applications for other regions and broader environmental studies.
- Research Article
3
- 10.1038/s41598-025-91434-w
- Apr 2, 2025
- Scientific Reports
Disseminated intravascular coagulation (DIC) is a thrombo-hemorrhagic disorder that can be life-threatening in critically ill children, and the quest for an accurate and efficient method for early DIC prediction is of paramount importance. Candidate predictors encompassed demographics, comorbidities, laboratory findings, and therapy strategies. A stepwise logistic regression model was employed to select the features included in the final model. Six machine learning algorithms—logistic regression (LR), extreme gradient boosting (XGB), random forest (RF), support vector machine (SVM), decision tree (DT), and k-nearest neighbor (KNN)—were employed to construct predictive models for DIC in critically ill children. Models were then evaluated by using area under the curve (AUC), accuracy, specificity, sensitivity, positive predictive value (PPV), negative predictive value (NPV), precision, recall and decision curve analysis (DCA). Interpretation of the optimal model was conducted using shapley additive explanations (SHAP). A total of 6093 critically ill children were encompassed in this study, of whom 681 (11.2%) developed DIC. The RF model exhibited the highest levels of accuracy (0.856), sensitivity (0.866), Kappa (0.472), NPV (0.423), and recall (0.866). However, the XGB model outperformed RF, LR, SVM, DT, and KNN in terms of AUC (0.908), specificity (0.859), PPV (0.978), and precision (0.969). Decision curve analysis (DCA) confirmed the superior clinical utility of the XGB model. Overall, the XGB model demonstrated superior clinical utility compared to RF, LR, SVM, DT, and KNN. We named the final model Alfalfa-PICU-DIC. SHAP analysis identified D-dimer, INR, PT, TT, and PLT count as the top predictors of DIC. Machine learning models can be a reliable tool for predicting DIC in critically ill children, which will facilitate timely intervention, thereby reducing the burden of DIC on patients in the pediatric intensive care unit (PICU).
- Research Article
4
- 10.1038/s41598-024-66979-x
- Jul 13, 2024
- Scientific Reports
The study aims to investigate the predictive capability of machine learning algorithms for omental metastasis in locally advanced gastric cancer (LAGC) and to compare the performance metrics of various machine learning predictive models. A retrospective collection of 478 pathologically confirmed LAGC patients was undertaken, encompassing both clinical features and arterial phase computed tomography images. Radiomic features were extracted using 3D Slicer software. Clinical and radiomic features were further filtered through lasso regression. Selected clinical and radiomic features were used to construct omental metastasis predictive models using support vector machine (SVM), decision tree (DT), random forest (RF), K-nearest neighbors (KNN), and logistic regression (LR). The models’ performance metrics included accuracy, area under the curve (AUC) of the receiver operating characteristic curve, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). In the training cohort, the RF predictive model surpassed LR, SVM, DT, and KNN in terms of accuracy, AUC, sensitivity, specificity, PPV, and NPV. Compared to the other four predictive models, the RF model significantly improved PPV. In the test cohort, all five machine learning predictive models exhibited lower PPVs. The DT model demonstrated the most significant variation in performance metrics relative to the other models, with a sensitivity of 0.231 and specificity of 0.990. The LR-based predictive model had the lowest PPV at 0.210, compared to the other four models. In the external validation cohort, the performance metrics of the predictive models were generally consistent with those in the test cohort. The LR-based model for predicting omental metastasis exhibited a lower PPV. Among the machine learning algorithms, the RF predictive model demonstrated higher accuracy and improved PPV relative to LR, SVM, KNN, and DT models.
- Research Article
13
- 10.1038/s41598-025-85945-9
- Jan 11, 2025
- Scientific Reports
Knee osteoarthritis (KOA) represents a progressive degenerative disorder characterized by the gradual erosion of articular cartilage. This study aimed to develop and validate biomarker-based predictive models for KOA diagnosis using machine learning techniques. Clinical data from 2594 samples were obtained and stratified into training and validation datasets in a 7:3 ratio. Key clinical features were identified through differential analysis between KOA and control groups, combined with least absolute shrinkage and selection operator (LASSO) regression. The SHapley Additive Planning (SHAP) method was employed to rank feature importance quantitatively. Based on these rankings, predictive models were constructed using Logistic Regression (LR), Random Forest (RF), eXtreme Gradient Boosting (xGBoost), Naive Bayes (NB), Support Vector Machine (SVM), and Decision Tree (DT) algorithms. Models were developed for subsets of variables, including the top 5, top 10, top 15, and all identified features. Receiver operating characteristic (ROC) curves were applied to compare diagnostic performance across models. Additionally, a risk stratification framework for KOA prediction was designed using recursive partitioning analysis (RPA). Using difference analysis and LASSO, 44 critical clinical features were identified. Among these, age, plasma prothrombin time, gender, body mass index (BMI), and prothrombin time and international normalized ratio (PTINR) emerged as the top five features, with SHAP values of 0.1990, 0.0981, 0.0471, 0.0433, and 0.0422, respectively. Machine learning analysis demonstrated that these variables provided robust diagnostic performance for KOA. In the training set, area under the curve (AUC) values for LR, RF, xGBoost, NB, SVM, and DT models were 0.947, 0.961, 0.892, 0.952, 0.885, and 0.779, respectively. Similarly, in the validation dataset, these models achieved AUC values of 0.961, 0.943, 0.789, 0.957, 0.824, and 0.76. Among them, RF consistently exhibited superior diagnostic accuracy for KOA. Additionally, RPA analysis indicated a higher prevalence of KOA among individuals aged 54 years and older. The integration of the top five clinical variables significantly enhanced the diagnostic accuracy for KOA, particularly when employing the RF model. Moreover, the RPA model offered valuable insights to assist clinicians in refining prognostic assessments and optimizing clinical decision-making processes.
- Research Article
16
- 10.1186/s13005-024-00446-w
- Aug 30, 2024
- Head & Face Medicine
BackgroundCranial, facial, nasal, and maxillary widths have been shown to be significantly affected by the individual’s sex. The present study aims to use measurements of dental arch and maxillary skeletal base to determine sex, employing supervised machine learning.Materials and methodsMaxillary and mandibular tomographic examinations from 100 patients were analyzed to investigate the inter-premolar width, inter-molar width, maxillary width, inter-pterygoid width, nasal cavity width, nostril width, and maxillary length, obtained through Cone Beam Computed Tomography scans. The following machine learning algorithms were used to build the predictive models: Logistic Regression, Gradient Boosting Classifier, K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Multi-Layer Perceptron Classifier (MLP), Decision Tree, and Random Forest Classifier. A 10-fold cross-validation approach was adopted to validate each model. Metrics such as area under the curve (AUC), accuracy, recall, precision, and F1 Score were calculated for each model, and Receiver Operating Characteristic (ROC) curves were constructed.ResultsUnivariate analysis showed statistical significance (p < 0.10) for all skeletal and dental variables. Nostril width showed greater importance in two models, while Inter-molar width stood out among dental measurements. The models achieved accuracy values ranging from 0.75 to 0.85 on the test data. Logistic Regression, Random Forest, Decision Tree, and SVM models had the highest AUC values, with SVM showing the smallest disparity between cross-validation and test data for accuracy metrics.ConclusionTransverse dental arch and maxillary skeletal base measurements exhibited strong predictive capability, achieving high accuracy with machine learning methods. Among the evaluated models, the SVM algorithm exhibited the best performance. This indicates potential usefulness in forensic sex determination.
- Research Article
1
- 10.4103/jfmpc.jfmpc_2025_23
- Jul 26, 2024
- Journal of family medicine and primary care
Hyperkalemia is a potentially life-threatening electrolyte disturbance that if not diagnosed on time may lead to devastating conditions and sudden cardiac death. Blood sampling for potassium level checks is time-consuming and can delay the treatment of severe hyperkalemia on time. So, we propose a non-invasive method for correct and rapid hyperkalemia detection. The cardiac signal of patients referred to the Pediatrics Emergency room of Shahid Rejaee Hospital was measured by a 12-lead Philips electrocardiogram (ECG) device. Immediately, the blood samples of the patients were sent to the laboratory for potassium serum level determination. We defined 16 features for each cardiac signal at lead 2 and extracted them automatically using the algorithm developed. With the help of the principal component analysis (PCA) algorithm, the dimension reduction operation was performed. The algorithms of decision tree (DT), random forest (RF), logistic regression, and support vector machine (SVM) were used to classify serum potassium levels. Finally, we used the receiver operation characteristic (ROC) curve to display the results. In the period of 5 months, 126 patients with a serum level above 4.5 (hyperkalemia) and 152 patients with a serum potassium level below 4.5 (normal potassium) were included in the study. Classification with the help of a RF algorithm has the best result. Accuracy, Precision, Recall, F1, and area under the curve (AUC) of this algorithm are 0.71, 0.87, 0.53, 0.66, and 0.69, respectively. A lead2-based RF classification model may help clinicians to rapidly detect severe dyskalemias as a non-invasive method and prevent life-threatening cardiac conditions due to hyperkalemia.
- Research Article
- 10.3791/69238
- Feb 13, 2026
- Journal of visualized experiments : JoVE
Low back pain (LBP) is a leading cause of disability and reduced quality of life globally, with discogenic low back pain (DLBP) accounting for 39% of cases. Accurate diagnosis of LBP etiology is challenging due to the lack of reliable methods. This study aims to improve DLBP diagnostic efficiency using lumbar spine MRI T2 data combined with radiomics and machine learning. This retrospective study analyzed MRI data from 81 DLBP patients and 162 healthy controls. Radiomics features, clinical data, and high-intensity zone (HIZ) imaging features were extracted. The data were divided into four groups (d0, d1, d2, D), and 20 predictive models were built using Random Forest (RF), Decision Tree (TREE), Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Logistic Regression (LOG). Model performance was evaluated using Receiver Operating Characteristic (ROC) area under the curve (AUC), precision recall (PR) AUC, accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F1 score. SHapley Additive exPlanations (SHAP) were applied to interpret the most significant features. The Random Forest in group D showed the best performance, with ROC AUCs of 0.9861 (train) and 0.9580 (test), PR AUCs of 0.9813 and 0.9179, and F1 scores of 0.9254 and 0.8148, respectively. SHAP analysis identified first-order kurtosis as the top feature contributing to DLBP diagnosis. The Random Forest model with SHAP analysis significantly improved DLBP diagnosis, offering high performance and interpretability to enhance clinical decision-making.
- Research Article
16
- 10.1097/cm9.0000000000002837
- Oct 12, 2023
- Chinese Medical Journal
Background:Acute pulmonary embolism (APE) is a fatal cardiovascular disease, yet missed diagnosis and misdiagnosis often occur due to non-specific symptoms and signs. A simple, objective technique will help clinicians make a quick and precise diagnosis. In population studies, machine learning (ML) plays a critical role in characterizing cardiovascular risks, predicting outcomes, and identifying biomarkers. This work sought to develop an ML model for helping APE diagnosis and compare it against current clinical probability assessment models.Methods:This is a single-center retrospective study. Patients with suspected APE were continuously enrolled and randomly divided into two groups including training and testing sets. A total of 8 ML models, including random forest (RF), Naïve Bayes, decision tree, K-nearest neighbors, logistic regression, multi-layer perceptron, support vector machine, and gradient boosting decision tree were developed based on the training set to diagnose APE. Thereafter, the model with the best diagnostic performance was selected and evaluated against the current clinical assessment strategies, including the Wells score, revised Geneva score, and Years algorithm. Eventually, the ML model was internally validated to assess the diagnostic performance using receiver operating characteristic (ROC) analysis.Results:The ML models were constructed using eight clinical features, including D-dimer, cardiac troponin T (cTNT), arterial oxygen saturation, heart rate, chest pain, lower limb pain, hemoptysis, and chronic heart failure. Among eight ML models, the RF model achieved the best performance with the highest area under the curve (AUC) (AUC = 0.774). Compared to the current clinical assessment strategies, the RF model outperformed the Wells score (P = 0.030) and was not inferior to any other clinical probability assessment strategy. The AUC of the RF model for diagnosing APE onset in internal validation set was 0.726.Conclusions:Based on RF algorithm, a novel prediction model was finally constructed for APE diagnosis. When compared to the current clinical assessment strategies, the RF model achieved better diagnostic efficacy and accuracy. Therefore, the ML algorithm can be a useful tool in assisting with the diagnosis of APE.
- Research Article
26
- 10.1038/s41598-023-31272-w
- Mar 13, 2023
- Scientific Reports
Gastric cancer (GC), with a 5-year survival rate of less than 40%, is known as the fourth principal reason of cancer-related mortality over the world. This study aims to develop predictive models using different machine learning (ML) classifiers based on both demographic and clinical variables to predict metastasis status of patients with GC. The data applied in this study including 733 of GC patients, divided into a train and test groups at a ratio of 8:2, diagnosed at Taleghani tertiary hospital. In order to predict metastasis in GC, ML-based algorithms, including Naive Bayes (NB), Random Forest (RF), Support Vector Machine (SVM), Neural Network (NN), Decision Tree (RT) and Logistic Regression (LR), with 5-fold cross validation were performed. To assess the model performance, F1 score, precision, sensitivity, specificity, area under the curve (AUC) of receiver operating characteristic (ROC) curve and precision-recall AUC (PR-AUC) were obtained. 262 (36%) experienced metastasis among 733 patients with GC. Although all models have optimal performance, the indices of SVM model seems to be more appropiate (training set: AUC: 0.94, Sensitivity: 0.94; testing set: AUC: 0.85, Sensitivity: 0.92). Then, NN has the higher AUC among ML approaches (training set: AUC: 0.98; testing set: AUC: 0.86). The RF of ML-based models, which determine size of tumor and age as two essential variables, is considered as the third efficient model, because of higher specificity and AUC (84% and 87%). Based on the demographic and clinical characteristics, ML approaches can predict the metastasis status in GC patients. According to AUC, sensitivity and specificity in both SVM and NN can be regarded as better algorithms among 6 applied ML-based methods.