Leveraging machine learning to predict de novo skin malignancy following lung transplantation
AimsThis study aimed to predict post-transplant malignancy risks at multiple levels among lung transplant recipients using machine learning (ML) and to identify key clinical and immunogenetic predictors.Materials and methodsA dataset of 30,917 lung transplant recipients with no prior cancer history was analyzed using pre-, peri-, and post-transplant variables. Multiple ML algorithms—gradient boosting, random forest, neural networks, and logistic regression—were applied to predict: (1) overall de novo malignancies (DNM), (2) skin versus non-skin cancers, and (3) skin cancer subtypes, including basal cell carcinoma (BCC) and squamous cell carcinoma (SCC).ResultsGradient boosting achieved the highest AUC for overall malignancies (0.746) and skin versus non-skin cancers (0.642), while random forest performed best for BCC versus SCC classification (AUC = 0.726). Significant predictors included HLA-DR alleles (DR52, DR1, DR53), A locus mismatch, recipient ethnicity, BMI, serum albumin, CMV/EBV serostatus, and cardiac-related measures (LV remodeling, cardiac output, prior cardiac surgery). Additional subtype predictors included peak PRA Class I sensitization, insulin signaling, donor-derived transfusions, and waiting list duration.ConclusionsML-driven predictive modeling enables personalized assessment of post-transplant malignancy risk, supporting early detection, targeted surveillance, and optimized long-term care for lung transplant recipients.
- Conference Article
- 10.2118/228674-ms
- Aug 4, 2025
In geologically diverse regions like Colombia, the geothermal gradient plays a critical role in energy exploration and resource management. It directly impacts the definition of subsurface temperature profiles, the characterization of geothermal reservoirs, and the economic feasibility of energy extraction projects. Accurate prediction of geothermal gradients is essential for sustainable and efficient energy development, making it a high priority for both traditional and renewable energy industries. Previous research has shown the potential of machine learning (ML) techniques, such as Extreme Gradient Boosting (XGBoost) for this task, but not much have explored the comparative performance of multiple ML algorithms. The aim of this study is to evaluate and compare multiple ML algorithms, such as Gradient Boosting Regressor (GBR), Random Forest (RF), K-Nearest Neighbors (KNN), and Multilayer Perceptron (MLP), for predicting geothermal gradients using a dataset obtained from well log measurements across Colombia. Models were compared based on performance metrics such as R-squared (R2), Mean Absolute Error (MAE), and Root Mean Square Error (RMSE) to identify the most accurate and robust algorithm. The RF model achieved the best test performance (R2=0.54), showing robust generalization. While XGBoost showed strong test results (R2=0.51), it exhibited signs of overfitting compared to its exceptional training score (R2=0.93). KNN delivered moderate accuracy as well (R2= 0.53), while MLP underperformed (R2=0.43). The results position RF as the most reliable algorithm for this application, providing valuable insights to optimize geothermal exploration and support Colombia's renewable energy transition through data-driven approaches.
- Research Article
13
- 10.1007/s00228-022-03445-5
- Dec 24, 2022
- European Journal of Clinical Pharmacology
Machine Learning (ML) algorithms represent an interesting alternative to maximum a posteriori Bayesian estimators (MAP-BE) for tacrolimus AUC estimation, but it is not known if training an ML model using a lower number of full pharmacokinetic (PK) profiles (= "true" reference AUC) provides better performances than using a larger dataset of less accurate AUC estimates. The objectives of this study were: to develop and benchmark ML algorithms trained using full PK profiles to estimate MeltDose®-tacrolimus individual AUCs using 2 or 3 blood concentrations; and to compare their performance to MAP-BE. Data from liver (n = 113) and kidney (n = 97) transplant recipients involved in MeltDose-tacrolimus PK studies were used for the training and evaluation of ML algorithms. "True" AUC0-24h was calculated for each patient using the trapezoidal rule on the full PK profile. ML algorithms were trained to estimate tacrolimus true AUC using 2 or 3 blood concentrations. Performances were evaluated in 2 external sets of 16 (renal) and 48 (liver) transplant patients. Best estimation performances were obtained with the MARS algorithm and the following limited sampling strategies (LSS): predose (0), 8, and 12h post-dose (rMPE = - 1.28%, rRMSE = 7.57%), or 0 and 12h (rMPE = - 1.9%, rRMSE = 10.06%). In the external dataset, the performances of the final ML algorithms based on two samples in kidney (rMPE = - 3.1%, rRMSE = 11.1%) or liver transplant recipients (rMPE = - 3.4%, rRMSE = 9.86%) were as good as or better than those of MAP-BEs based on three time points. The MARS ML models developed using "true" MeltDose®-tacrolimus AUCs yielded accurate individual estimations using only two blood concentrations.
- Research Article
2
- 10.1007/s42452-025-07268-8
- Jan 1, 2025
- Discover Applied Sciences
Water droplet erosion (WDE) is a critical degradation phenomenon that significantly affects component lifespan and performance in power generation, aerospace, and wind energy industries. The incubation period—the initial phase before visible material loss occurs—is particularly crucial for maintenance planning and material selection yet remains challenging to predict accurately due to the complex interplay of material properties and impact conditions. Traditional empirical models have shown limited predictive capability due to their reliance on numerous adjustable parameters with insufficient physical interpretation. This study aimed to develop and validate a machine learning (ML) approach for accurately predicting the WDE incubation period across different metallic materials and impact conditions. The performance of various ML algorithms is evaluated while investigating the effect of data transformation techniques on prediction accuracy. A range of ML models—linear regression (LR), decision tree regressor (DT), random forest regressor (RF), gradient boosting regressor (GBR), and artificial neural networks (ANN)—were trained and validated using experimental data from five different alloys under various impact conditions. Data transformation methods significantly enhanced model performance, with the LR model using Box-Cox transformation achieving the highest accuracy (R2 > 90%, low MAE), followed by the ANN model with Yeo-Johnson transformation (R2 > 85%). Feature importance analysis through SHAP values revealed that impact velocity and surface hardness were the most influential factors affecting incubation period, providing valuable physical insights into the erosion mechanism. Hyperparameter optimization techniques showed minimal improvement in model performance, suggesting that the transformations effectively captured the underlying relationships in the data. This research represents the first comprehensive application of ML techniques to WDE incubation period prediction, establishing a methodological framework that integrates experimental data, statistical analysis, and advanced ML algorithms. Unlike previous approaches, our methodology (1) systematically evaluates multiple ML algorithms and transformation techniques for WDE prediction, (2) provides quantitative assessment of feature importance that aligns with physical understanding of erosion mechanisms, (3) demonstrates superior predictive accuracy compared to traditional empirical models, and (4) offers a generalizable approach applicable across different metallic materials and impact conditions. This work bridges the gap between data-driven modeling and physical understanding of WDE, providing a valuable tool for engineers to optimize material selection and maintenance strategies in erosion-prone applications.
- Research Article
51
- 10.1016/j.egyr.2023.08.009
- Aug 16, 2023
- Energy Reports
Prediction of oil and gas pipeline failures through machine learning approaches: A systematic review
- Research Article
2
- 10.1371/journal.pone.0319519
- Apr 9, 2025
- PloS one
This study aimed to develop models for predicting the 30-day mortality of sepsis-associated delirium (SAD) by multiple machine learning (ML) algorithms. On the whole, a cohort of 3,197 SAD patients were collected from the Medical Information Mart for Intensive Care (MIMIC)-IV database. Among them, a total of 659 (20.61%) patients died following SAD. The patients who died were about 73.00 (62.00, 82.00) years old and mostly male (56.75%). Recursive feature elimination (RFE) was used to distinguish risk factors. Subsequently, six ML algorithms including artificial neural network (NNET), gradient boosting machine (GBM), adaptive boosting (Ada), random forest (RF), eXtreme Gradient Boosting (XGB) and logistic regression (LR) were employed to establish models to predict the 30-day mortality of SAD. The performance of models was assessed via both discrimination and calibration by cross-validation with 100 resamples. Overall, 10 independent predictors, including Glasgow Coma Scale (GCS), Sequential Organ Failure Assessment (SOFA), anion gap (AG), continuous renal replacement therapy (CRRT), temperature, mean corpuscular hemoglobin concentration (MCHC), vasopressor, blood urea nitrogen (BUN), base excess (BE), and bicarbonate were identified as independent predictors for the 30-day mortality of SAD. The validation cohort demonstrated that all these six models had relatively favorable differentiation, while among them, the GBM model had the highest area under the curve (AUC) of 0.845 (95% Confidence Interval (CI): 0.816, 0.874). Furthermore, the calibration curve of these six models was close to the diagonal line in the validation sets. As for decision curve analysis, the predictive models were clinically useful as well. Based on real-world research, we developed ML models to provide personalized predictions of delirium-related mortality in sepsis patients, potentially enabling clinicians to identify high-risk SAD patients more promptly.
- Research Article
- 10.6002/ect.2024.0136
- Aug 1, 2024
- Experimental and clinical transplantation : official journal of the Middle East Society for Organ Transplantation
De novo malignancies are the most common cause of death after solid-organ transplant. Here, we aimed to summarize standard incidence ratios of de novo malignancies after liver and kidney transplant within the same geographical locations, compare these ratios among differenttypes of de novo malignancies after liver and kidney transplant, and elucidate differences in de novo malignancies between liver and kidney transplant recipients. We performed a systematic review to identify studies on standard incidence ratios of de novo malignancies after liver and kidney transplant in the United Kingdom, Sweden, South Korea, and Taiwan. Four articles reported standard incidence ratios of de novo malignancies in 14 016 liver transplant recipients (mean follow-up 4.3 ± 0.7 y) and 48179 kidney transplant recipients (mean follow-up 6.1 ± 2.1 y). Mean ratios of oropharyngeal, pulmonary, colorectal, renal, and breast malignancies were 5.3, 1.6, 1.9, 1.8, and 1.1,respectively, after liver transplant and 3.2, 1.7, 1.5, 17.0, and 1.3, respectively, after kidney transplant. Mean ratios of bladder, cervixuterus, and stomach de novo malignancies were 1.8, 2.0, and 2.9, respectively, after liver transplant and 13.0, 1.9, and 1.9,respectively, after kidney transplant. Mean ratios of prostatic and esophageal malignancies were 1.6 and 1.8 after liver transplant and 1.2 and 1.1 after kidney transplant. Mean ratio of ovarian cancer was 1.2 and 2.9, respectively, after liver and kidney transplant. Low-frequency and lower standard incidence ratios were observed for testicular, ovarian and central nervous system malignancies after kidney and liver transplant. Standard incidence ratios of oropharyngeal and hepatic malignancies were higher after liver transplant compared with kidney transplant. After kidney transplant, standardized ration for renal malignancy were 9.4 times and bladder malignancies were 7.2 times higher compared with liver transplant recipients.
- Discussion
- 10.1053/j.gastro.2009.10.006
- Oct 29, 2009
- Gastroenterology
This Month in Gastroenterology
- Research Article
10
- 10.1097/tp.0000000000003304
- Aug 18, 2020
- Transplantation
Artificial Intelligence-related Literature in Transplantation: A Practical Guide.
- Research Article
64
- 10.1109/tmc.2015.2461216
- Jun 1, 2016
- IEEE Transactions on Mobile Computing
The impact of the network performance on the quality of experience (QoE) for various services is not well-understood. Assessing the impact of different network and channel conditions on the user experience is important for improving the telecommunication services. The QoE for various wireless services including VoIP, video streaming, and web browsing, has been in the epicenter of recent networking activities. The majority of such efforts aim to characterize the user experience, analyzing various types of measurements often in an aggregate manner. This paper proposes the MLQoE, a modular algorithm for user-centric QoE prediction. The MLQoE employs multiple machine learning (ML) algorithms, namely, Artificial Neural Networks, Support Vector Regression machines, Decision Trees, and Gaussian Naive Bayes classifiers, and tunes their hyper-parameters. It uses the Nested Cross Validation (nested CV) protocol for selecting the best classifier and the corresponding best hyper-parameter values and estimates the performance of the final model. The MLQoE is conservative in the performance estimation despite multiple induction of models. The MLQoE is modular, in that, it can be easily extended to include other ML algorithms. The MLQoE selects the ML algorithm that exhibits the best performance and its parameters automatically given the dataset used as input. It uses empirical measurements based on network metrics (e.g., packet loss, delay, and packet interarrival) and subjective opinion scores reported by actual users. This paper extensively evaluates the MLQoE using three unidirectional datasets containing VoIP calls over wireless networks under various network conditions and feedback from subjects (collected in field studies). Moreover, it performs a preliminary analysis to assess the generality of our methodology using bidirectional VoIP and video traces. The MLQoE outperforms several state-of-the-art algorithms, resulting in fairly accurate predictions.
- Research Article
56
- 10.1186/1471-2105-15-82
- Mar 24, 2014
- BMC Bioinformatics
BackgroundTransient protein-protein interactions (PPIs), which underly most biological processes, are a prime target for therapeutic development. Immense progress has been made towards computational prediction of PPIs using methods such as protein docking and sequence analysis. However, docking generally requires high resolution structures of both of the binding partners and sequence analysis requires that a significant number of recurrent patterns exist for the identification of a potential binding site. Researchers have turned to machine learning to overcome some of the other methods’ restrictions by generalising interface sites with sets of descriptive features. Best practices for dataset generation, features, and learning algorithms have not yet been identified or agreed upon, and an analysis of the overall efficacy of machine learning based PPI predictors is due, in order to highlight potential areas for improvement.ResultsThe presence of unknown interaction sites as a result of limited knowledge about protein interactions in the testing set dramatically reduces prediction accuracy. Greater accuracy in labelling the data by enforcing higher interface site rates per domain resulted in an average 44% improvement across multiple machine learning algorithms. A set of 10 biologically unrelated proteins that were consistently predicted on with high accuracy emerged through our analysis. We identify seven features with the most predictive power over multiple datasets and machine learning algorithms. Through our analysis, we created a new predictor, RAD-T, that outperforms existing non-structurally specializing machine learning protein interface predictors, with an average 59% increase in MCC score on a dataset with a high number of interactions.ConclusionCurrent methods of evaluating machine-learning based PPI predictors tend to undervalue their performance, which may be artificially decreased by the presence of un-identified interaction sites. Changes to predictors’ training sets will be integral to the future progress of interface prediction by machine learning methods. We reveal the need for a larger test set of well studied proteins or domain-specific scoring algorithms to compensate for poor interaction site identification on proteins in general.
- Research Article
2
- 10.1007/s40098-025-01297-1
- Jul 5, 2025
- Indian Geotechnical Journal
The strength and stability of pavement subgrades are heavily influenced by the geotechnical properties of underlying soils, which can exhibit high variability in regions with complex soil compositions, such as lateritic–lithomargic subgrades. This study introduces a novel approach by employing machine learning (ML) models to predict the California bearing ratio (CBR) of lateritic–lithomargic subgrade soils found in Karavali Karnataka regions of southern peninsular India. The primary novelty of this research lies in leveraging ML techniques to analyze soil characteristics and develop predictive ML models for CBR, reducing the reliance on extensive and time-consuming laboratory testing. Soil samples for this study were collected from 20 different locations at the low-volume road junctions along the stretch of KAR-SH-1 to ensure diversity and representativeness of the soil and were subjected to a series of tests to determine basic geotechnical properties (gradation, plasticity index, specific gravity, and compaction) and soaked CBR value. The data pertaining to tests performed on soil samples for 80% of the locations in the region were used to develop the ML models, while data on the remaining 20% locations were used in validating the same. Finally, the models with acceptable level of statistical results were given higher weightage, where the regressors gave the best performance metrics to the independent models with lower error rate compared to the other models. This study applies multiple ML algorithms, including multiple linear regression (MLR), decision tree (DT), random forest (RF), support vector machine (SVM), AdaBoost, and gradient boosting regressor (GBR), to identify the relationships between these soil parameters and CBR values. The modeling pipeline incorporated standardized pre-processing, k-fold cross-validation, and advanced performance metrics including MAE, MSE, RMSE, R2, CV-Mean a20-index, a10-index, performance index (PI), improvement assessment (IA), and objective function (OBJ). The key results show that the AdaBoost and GBR models outperformed others in prediction accuracy, with AdaBoost achieving the lowest RMSE (0.378284) and the highest R 2 score (0.952), with satisfactory other key-results, indicating a robust model for practical use in pavement applications. The ML models successfully identified key soil properties that correlate with CBR, facilitating more efficient and accurate pavement subgrade evaluation. Overall, this study highlights the potential of ML-based approaches to streamline the design of pavements and embankments, particularly in regions with lateritic–lithomargic soils, by providing sustainable, rapid, cost-effective, and data-driven solutions for the highway construction industry. Graphical Abstract
- Research Article
41
- 10.1002/ijc.31782
- Oct 26, 2018
- International Journal of Cancer
In the setting of liver transplant (LT), the survival after the diagnosis of de novo malignancies (DNMs) has been poorly investigated. In this study, we assessed the impact of DNMs on survival of LT recipients as compared to corresponding LT recipients without DNM. A nested case-control study was conducted in a cohort of 2,818 LT recipients enrolled in nine Italian centres between 1985 and 2014. Cases were 244 LT recipients who developed DNMs after LT. For each case, two controls matched for gender, age, and year at transplant were selected by incidence density sampling among cohort members without DNM. The survival probabilities were estimated using the Kaplan-Meier method. Hazard ratios (HRs) of death and 95% confidence intervals (CIs) were estimated using Cox proportional hazard models. The all-cancer 10-year survival was 43% in cases versus 70% in controls (HR = 4.66; 95% CI: 3.17-6.85). Survival was impaired in cases for all the most frequent cancer types, including lung (HR = 37.13; 95% CI: 4.98-276.74), non-Hodgkin lymphoma (HR = 6.57; 95% CI: 2.15-20.01), head and neck (HR = 4.65; 95% CI: 1.81-11.95), and colon-rectum (HR = 3.61; 95% CI: 1.08-12.07). The survival gap was observed for both early and late mortality, although the effect was more pronounced in the first year after cancer diagnosis. No significant differences in survival emerged for Kaposi's sarcoma and nonmelanoma skin cancers. The survival gap herein quantified included a broad range of malignancies following LT and prompts close monitoring during the post-transplant follow-up to ensure early cancer diagnosis and to improve survival.
- Conference Article
- 10.1063/5.0122942
- Jan 1, 2023
Despite considerable success in discovering the knowledge, conventional machine learning algorithms may defeat to achieve satisfying performances during the transaction with imbalanced, complex, noise, and high dimensional data. In this context, it is substantial to think about efficiently building an adequate knowledge and mining model. Ensemble learning aims to consolidate the classical machine learning (ML) algorithms, data modeling, and data mining into a unified framework. Text categorization is a critical application that uses the unified ensemble learning framework to detect a new article's class. This paper develops a two-layer stacking ensemble model containing different ML algorithms. Since, stacking model consist of stacked layers and each layer built with multiple ML algorithms we constructed the first layer of our stacking model with three ML algorithms (Multinominal Naïve Bayes (MNB, logistic regression (LR), and k-Nearest Neighbor (k-NN)) classifiers, while the second layer applies a random forest classification algorithm. The proposed stacking ensemble model is compared with the classical ML algorithms MNB, LR, and k-NN) in accuracy and error measure. The result shows that using the stacking model outperforms better than MNB and k-NN algorithms with Accuracy reached 89.72% and 89.75 %, respectively. While using LR, the Accuracy equals 91.5%, which is closed to the result of the proposed model, which equals 91.66%.
- Research Article
45
- 10.1016/j.neuroscience.2023.01.029
- Feb 2, 2023
- Neuroscience
In Mild Cognitive Impairment (MCI), identifying a high risk of conversion to Alzheimer’s Disease Dementia (AD) is a primary goal for patient management. Machine Learning (ML) algorithms are widely employed to pursue data-driven diagnostic and prognostic goals. An agreement on the stability of these algorithms –when applied to different biomarkers and other conditions– is far from being reached. In this study, we compared the different prognostic performances of three supervised ML algorithms fed with multimodal biomarkers of MCI subjects obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. Random Forest, Gradient Boosting, and eXtreme Gradient Boosting algorithms predict MCI conversion to AD. They can also be simultaneously employed –with the voting procedure– to improve predictivity. AD prediction accuracy is influenced by the nature of the data (i.e., neuropsychological test scores, cerebrospinal fluid AD-related proteins and APOE ε4, cerebral structural MRI (sMRI) data). In our study, independent of the applied ML algorithms, sMRI data showed the lowest accuracy (0.79) compared to other classes. Multimodal data were helpful in the algorithms’ performances by combining clinical and biological measures. Accordingly, using the three ML algorithms, the highest accuracy (0.90) was reached by employing neuropsychological and AD-related biomarkers. Finally, the feature selection procedure indicated that the most critical variables in the respective classes were the ADAS-Cog-13 scale, the medial temporal lobe and hippocampus atrophy, and the ratio between phosphorylated Tau and Aβ42 proteins. In conclusion, our data support the notion that using multiple ML algorithms and multimodal biomarkers helps make more accurate and solid predictions.
- Research Article
- 10.2174/0126662906402878251122062713
- Dec 3, 2025
- The International Journal of Gastroenterology and Hepatology Diseases
Abstract: Over the last few decades, advances in liver transplantation (LT) have consistently improved patient survival, providing recipients with longer life expectancy and better quality of life. However, patients undergoing prolonged immunosuppression face an increased risk of de novo malignancies (DNMs) due to the oncogenic effects of immunosuppressive drugs and other established risk factors. These include chronic viral infections, sun exposure, smoking, the underlying etiology of liver disease (such as alcoholic liver disease and primary sclerosing cholangitis), and various biodemographic factors, including age, body weight, lifestyle choices, and ethnicity. As a result, DNMs have become one of the leading causes of late mortality among LT recipients, accounting for 20– 25% of post-transplant deaths. Nonmelanoma skin cancers are the most common DNMs, representing 35–40% of cases, with a cumulative risk 10–20 times higher than that of age- and sex-matched individuals in the general population. Post-transplant lymphoproliferative disorders (PTLD) are the second most frequent DNMs, particularly prevalent in pediatric recipients (5–20% of total DNMs), and are associated with significantly lower survival rates. Solid organ tumors, which account for 40–50% of DNMs, primarily involve the lung, head and neck, and colorectal sites, and demonstrate a two- to threefold higher incidence and more aggressive progression compared to the general population. The development of solid organ DNMs has a major impact on long-term outcomes following LT, highlighting the need for updated reviews focused on prevention and management. The increased incidence of DNMs in LT recipients underscores the importance of minimizing immunosuppressive therapy, stratifying patients according to cancer risk, implementing early protective strategies (such as reducing exposure to known risk factors), and establishing tailored, costeffective long-term screening protocols to detect malignancies at an early stage. Early detection enables timely treatment, ultimately improving long-term survival and quality of life. Future prospective studies are needed to optimize and validate surveillance strategies for this high-risk population.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.