Machine Learning Random Forest Model Research Articles

Abstract Background: Machine learning (ML) in translational medicine has led to prediction of clinical outcomes and identification of new biomarkers. We employ ML in prediction of pathologic complete response (pCR) in high-risk breast cancer patients in the neoadjuvant I-SPY2 TRIAL where not all novel agents have strong predictive biomarkers. Leveraging a ML approach using progressively expanded candidate genes, we explore the limitations of using only known mechanisms of action in predicting pCR, and the extent to which biology outside known drug action improves response prediction in the first 10 arms of the trial. Methods: ML random forest models were developed in I-SPY2 patients (n=982) with pre-treatment gene expression and pCR data across 10 treatment arms (PMID: 35623341), including inhibitors of HER2: neratinib (N), pertuzumab (P), TDM1/P; AKT (MK-2206); IGF1R (ganitumab); HSP90 (ganetespib); PARP/DNA repair (veliparib/carboplatin, VC); ANG1/2 (trebananib, T); immune checkpoints (PD1-inh); and Control (Ctr). Each HR/HER2 receptor/treatment arm subset (m=27) was evaluated independently. We employed a three-pronged feature-selection approach using (1) genes restricted to known mechanism of action of individual I-SPY2 agents (k=10 to 88 genes); (2) genes expanded to include targeted pathways for all 10 agents/combinations (k=282); and (3) an unbiased whole genome approach (k=17,990). Samples were partitioned with 75% used for training and cross-validation, and 25% held out as test sets. Predictive ML models were defined as those with performance ≥ 0.90 based on different performance metrics (e.g., AUC, sensitivity, specificity). Results: For each of the 27 subtype-treatment subsets, at least one high performing model was identified. In 6 subtype-treatment subsets, mechanism of action genes were sufficient to predict pCR: AKT/PI3K/HER genes in HR+HER2- N and HR-HER2+ P; DNA repair genes in HR+HER2- VC; angiogenesis-associated genes in HR+HER2+ T; and immune-associated genes in both HR+HER2- and HR-HER2- PD1-inh subsets. Expanded targeted pathway models were required to identify predictive models in 8 additional subtype-treatment pairs from the N, T-DM1/P, MK-2206, VC, T, and HER2+ Ctr arms, with significant contribution of DNA repair, immune, and HSP90 genes for multiple arms. A genome-wide approach was required for the remaining 13 subtype-treatment pairs with no previous models from the N, P, MK-2206, ganitumab, ganetespib, T, and HER2- Ctr arms. Even for subtype-treatment pairs where mechanism of action gene sets was sufficient for reasonable models, expanded gene sets resulted in improved performance. For instance, metabolism genes improved model performance for HR-HER2+ in N and Ctr, and for HR+HER2- in the PD1-inh arm; and mitochondrial and protein folding dysfunction genes improved response prediction in HR-HER2- in the ganetespib arm. Conclusion: Our study identifies mechanism of action biomarkers associated with response to each drug and elucidates possible off-target effects contributing to observed drug sensitivity and resistance. Citation Format: Rosalyn W. Sayaman, Denise M. Wolf, Christina Yau, Julia Wulfkhule, Emanuel F. Petricoin, Lamorna Brown-Swigart, Tam Binh Bui, Gillian L. Hirst, Diane Heditsian, W. Fraser Symmans, Angela DeMichele, Mark LaBarge, Laura J. Esserman, Laura van ‘t Veer. Machine learning elucidates biology of response within and outside the mechanisms of action of therapeutic agents in the I-SPY2 breast cancer TRIAL [abstract]. In: Proceedings of the AACR Special Conference in Cancer Research: Advances in Breast Cancer Research; 2023 Oct 19-22; San Diego, California. Philadelphia (PA): AACR; Cancer Res 2024;84(3 Suppl_1):Abstract nr A066.

Read full abstract

BackgroundEffectively and efficiently diagnosing patients who have COVID-19 with the accurate clinical type of the disease is essential to achieve optimal outcomes for the patients as well as to reduce the risk of overloading the health care system. Currently, severe and nonsevere COVID-19 types are differentiated by only a few features, which do not comprehensively characterize the complicated pathological, physiological, and immunological responses to SARS-CoV-2 infection in the different disease types. In addition, these type-defining features may not be readily testable at the time of diagnosis.ObjectiveIn this study, we aimed to use a machine learning approach to understand COVID-19 more comprehensively, accurately differentiate severe and nonsevere COVID-19 clinical types based on multiple medical features, and provide reliable predictions of the clinical type of the disease.MethodsFor this study, we recruited 214 confirmed patients with nonsevere COVID-19 and 148 patients with severe COVID-19. The clinical characteristics (26 features) and laboratory test results (26 features) upon admission were acquired as two input modalities. Exploratory analyses demonstrated that these features differed substantially between two clinical types. Machine learning random forest models based on all the features in each modality as well as on the top 5 features in each modality combined were developed and validated to differentiate COVID-19 clinical types.ResultsUsing clinical and laboratory results independently as input, the random forest models achieved >90% and >95% predictive accuracy, respectively. The importance scores of the input features were further evaluated, and the top 5 features from each modality were identified (age, hypertension, cardiovascular disease, gender, and diabetes for the clinical features modality, and dimerized plasmin fragment D, high sensitivity troponin I, absolute neutrophil count, interleukin 6, and lactate dehydrogenase for the laboratory testing modality, in descending order). Using these top 10 multimodal features as the only input instead of all 52 features combined, the random forest model was able to achieve 97% predictive accuracy.ConclusionsOur findings shed light on how the human body reacts to SARS-CoV-2 infection as a unit and provide insights on effectively evaluating the disease severity of patients with COVID-19 based on more common medical features when gold standard features are not available. We suggest that clinical information can be used as an initial screening tool for self-evaluation and triage, while laboratory test results should be applied when accuracy is the priority.

Read full abstract

Machine Learning Random Forest Model Research Articles

Articles published on Machine Learning Random Forest Model

Long-term prediction of algal chlorophyll based on empirical models and the machine learning approach in relation to trophic variation in Juam Reservoir, Korea

Discordance between aPTT and anti-Xa in monitoring heparin anticoagulation in mechanical circulatory support.

Abstract A066: Machine learning elucidates biology of response within and outside the mechanisms of action of therapeutic agents in the I-SPY2 breast cancer TRIAL

Meteorologically normalized spatial and temporal variations investigation using a machine learning-random forest model in criteria pollutants across Tehran, Iran

Metagenomics-Based Microbial Ecological Community Threshold and Indicators of Anthropogenic Disturbances in Estuarine Sediments.

Dynamics of the Gut Mycobiome in Patients With Ulcerative Colitis

Establishment and validation of a clinical nomogram model based on serum YKL-40 to predict major adverse cardiovascular events during hospitalization in patients with acute ST-segment elevation myocardial infarction.

Prognostic MicroRNA Fingerprints Predict Recurrence of Early-Stage Hepatocellular Carcinoma Following Hepatectomy.

Nitrogen isotope enrichment predicts growth response of Pinus radiata in New Zealand to nitrogen fertiliser addition

A simplified prediction model for end-stage kidney disease in patients with diabetes

Disturbance of serum lipid metabolites and potential biomarkers in the Bleomycin model of pulmonary fibrosis in young mice

Yeast and Lactic Acid Bacteria Dominate the Core Microbiome of Fermented ‘Hairy’ Tofu (Mao Tofu)

Spatiotemporal Monitoring of Soil CO2 Efflux in a Subtropical Forest during the Dry Season Based on Field Observations and Remote Sensing Imagery

Applying a machine learning modelling framework to predict delayed linkage to care in patients newly diagnosed with HIV in Mecklenburg County, North Carolina, USA.

A Multimodality Machine Learning Approach to Differentiate Severe and Nonsevere COVID-19: Model Development and Validation.

A novel microfluidic-based assay to determine high-risk pancreatic cysts for surgery utilizing pancreatic cyst fluid: an international multicenter study

Machine learning model demonstrates stunting at birth and systemic inflammatory biomarkers as predictors of subsequent infant growth \u2013 a four-year prospective study

Deep profiling of apoptotic pathways with mass cytometry identifies a synergistic drug combination for killing myeloma cells.

On the interpretability of machine learning-based model for predicting hypertension

A Phenotype-Based Approach for the Substrate Water Status Forecast of Greenhouse Netted Muskmelon.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Machine Learning Random Forest Model Research Articles

Articles published on Machine Learning Random Forest Model

Long-term prediction of algal chlorophyll based on empirical models and the machine learning approach in relation to trophic variation in Juam Reservoir, Korea

Discordance between aPTT and anti-Xa in monitoring heparin anticoagulation in mechanical circulatory support.

Abstract A066: Machine learning elucidates biology of response within and outside the mechanisms of action of therapeutic agents in the I-SPY2 breast cancer TRIAL

Meteorologically normalized spatial and temporal variations investigation using a machine learning-random forest model in criteria pollutants across Tehran, Iran

Metagenomics-Based Microbial Ecological Community Threshold and Indicators of Anthropogenic Disturbances in Estuarine Sediments.

Dynamics of the Gut Mycobiome in Patients With Ulcerative Colitis

Establishment and validation of a clinical nomogram model based on serum YKL-40 to predict major adverse cardiovascular events during hospitalization in patients with acute ST-segment elevation myocardial infarction.

Prognostic MicroRNA Fingerprints Predict Recurrence of Early-Stage Hepatocellular Carcinoma Following Hepatectomy.

Nitrogen isotope enrichment predicts growth response of Pinus radiata in New Zealand to nitrogen fertiliser addition

A simplified prediction model for end-stage kidney disease in patients with diabetes

Disturbance of serum lipid metabolites and potential biomarkers in the Bleomycin model of pulmonary fibrosis in young mice

Yeast and Lactic Acid Bacteria Dominate the Core Microbiome of Fermented ‘Hairy’ Tofu (Mao Tofu)

Spatiotemporal Monitoring of Soil CO2 Efflux in a Subtropical Forest during the Dry Season Based on Field Observations and Remote Sensing Imagery

Applying a machine learning modelling framework to predict delayed linkage to care in patients newly diagnosed with HIV in Mecklenburg County, North Carolina, USA.

A Multimodality Machine Learning Approach to Differentiate Severe and Nonsevere COVID-19: Model Development and Validation.

A novel microfluidic-based assay to determine high-risk pancreatic cysts for surgery utilizing pancreatic cyst fluid: an international multicenter study

Machine learning model demonstrates stunting at birth and systemic inflammatory biomarkers as predictors of subsequent infant growth \u2013 a four-year prospective study

Deep profiling of apoptotic pathways with mass cytometry identifies a synergistic drug combination for killing myeloma cells.

On the interpretability of machine learning-based model for predicting hypertension

A Phenotype-Based Approach for the Substrate Water Status Forecast of Greenhouse Netted Muskmelon.