Test Set Validation Research Articles

Abstract Background PPARG is a cell lineage determining transcription factor in muscle invasive urothelial carcinoma (MIUC), where high expression is associated with the luminal subtype (1, 2). As FX-909(2), a PPARG inverse agonist, enters the clinic, biomarkers that reflect the luminal subtype will reveal patients with the potential to respond to PPARG inhibition. The determination of luminal status is generally performed via RNA-seq and/or multiple immunohistochemistry stains, which are costly and time-consuming. However, MIUC biopsies are routinely stained with hematoxylin and eosin (H&E). Machine learning (ML)-driven analyses of H&E-stained tissue may enable the identification of patients with luminal MIUC and have advantages over the current molecular approach. Methods H&E-stained slides from 367 unique primary MIUC cases from the TCGA BLCA dataset were split into training (70%), validation (15%), and held-out test (15%) sets by preserving the data distribution of patient metadata. A curated retrospective cohort of 42 localized, stage III-IV primary MIUCs was used as an independent test set. Molecular classification as luminal (luminal papillary, luminal, and luminal infiltrated subtypes) or non-luminal (basal-squamous and neuronal subtypes) was performed and used as ground truth(3). Pretrained artifact and tissue segmentation models were deployed on all images to identify artifact-free areas of cancer and cancer-associated stroma. An end-to-end (E2E) additive multiple instance learning model was trained to identify luminal cases using the training set. Top performing model iterations were compared on the validation set, and the optimal iteration was deployed on both test sets. Results We assessed the performance of our E2E model in predicting luminal status using the molecular subtypes derived from Robertson et al. as ground truth(3). The E2E model showed excellent performance when predicting luminal status in the TCGA validation, TCGA test, and independent test sets (AUROC = 0.96, 0.95, and 0.97, respectively). The accuracy in all three cohorts was 89-90%, with a sensitivity of 0.86-0.96, a specificity of 0.82-0.94, and an F1 score of 0.88-0.9. Conclusions We generated a robust ML model that accurately predicts luminal MIUC using H&E-stained slides. Luminal MIUC is dependent on PPARG, and PPARG inverse agonism represents a promising therapeutic approach for MIUC. Coupled with the first-in-class FX-909 therapeutic entering the clinic, the strong performance of our model highlights the potential for its application as a precision biomarker to identify patients with advanced urothelial carcinoma likely to respond to PPARG inhibition.

Read full abstract

Large language models (LLM) can potentially revolutionize the healthcare industry. They could reduce the burden on healthcare, increase care accessibility in areas with shortage and provide multilingual support to break down language barriers. Although, these models equipped with vast amounts of medical knowledge and the ability to understand and generate human-like text, require a proper set of feeding data (i.e., prompt engineering) for accurate diagnosis and providing reliable personalized treatment plans for patients. Multiple Myeloma (MM) is a complex hematological malignancy characterized by the uncontrolled proliferation of plasma cells in the bone marrow. Disease management for MM becomes particularly challenging due to its multisystemic nature based on the varying volume of malignant cells within the bone marrow. Implementing LLMs for clinical assessment of patients with MM needs feature selection to develop the most effective prompt for these models. Here, we utilized a machine learning (ML) approach to define salient features in a typical visit day that correlates most significantly with disease volume on the same day. These features could be the best candidate to reflect the multisystemic and dynamic nature of MM in each visit, and they could be candidates to be incorporated into LLMs to develop a system-based assessment in clinic visits. Methods: This study examined 1,472 clinical observations. To select a curated list of features associated with same-day M-spike values, 43 clinical and lab variables were input into an ML model. Random Forest (RF), an ensemble of regression trees suitable for nonlinear multiple regression, was selected as the model. The data were randomly divided into a training set (80%) and a test set (20%) for model validation. Using bootstrapping and generating 500 data sets, a random forest of regression trees was constructed, and results and estimates were aggregated across the trees. To determine the importance of each covariate, their inclusion and exclusion were compared in the models. Results: The residual distribution of the RF model indicated that nearly all M-spike values determined using the 43 variables distributed equally on either side of zero (Fig. 1). The weighted value of each of the 43 independent variables was determined by individually removing a variable from the ML algorithm and measuring its effect on the mean squared error (MSE) (Fig.2). Removal of the first lagged M-spike, serum total protein, second-lagged M-spike, serum IgG, serum IgM, and serum IgA, had the greatest effects on the ML algorithm. M-spike values determined using the ML algorithm correlated highly with M-spike values determined using the laboratory measured SPEP values as indicated by the proximity of the Pearson and Spearman correlation coefficients to +1. Using the 43 variables, the Pearson coefficient was 0.96 and the Spearman coefficient was 0.91. Feature selected modeling was performed to reduce the variables needed to predict the M-spike. Five RF models with different predictors were selected for comparison. Model A included all 43 predictors, Model B included the ten most important variables, Model C the top five variables, Model D included the first and second-lagged M-spike and serum total protein, and Model E the first-lagged M-spike and serum total protein. The Pearson's r and RMSE (root mean square error) values were used to compare the models. Pearson's r values for Model A, B, C, and D were 0.96, 0.96, 0.96, and 0.95 respectively, and the RMSE values were 0.21, 0.19, 0.19, and 0.22. In Model E, feature selection used only two variables and accurately predicted the M-spike value (Pearson's r = 0.95; RMSE 0.22). The Pearson's r values for feature selected models A, B, C, D, and E were 0.95, 0.96, 0.96, 0.95 and 0.91. Conclusion: Accurate prompt engineering to create global assessment of Myeloma clone by LLM requires a curated set of variables that correlated with disease volume. Here, we developed an ML model for feature selection utilizing the same-day available data in the patient chart. Features could be used in order of importance to provide focused, comprehensive prompts that aligned with the patient's context. The quality of AI-assisted disease assessment using these models should be compared with assessments performed by real world providers in future studies to ensure that the LLMs generate a written assessment that accurately reflects the patient's health status.

Read full abstract

Test Set Validation Research Articles

Related Topics

Articles published on Test Set Validation

Development and validation of outcome prediction model for reperfusion therapy in acute ischemic stroke using nomogram and machine learning.

Development and validation of a multimodal model in predicting severe acute pancreatitis based on radiomics and deep learning

Detecting Avascular Necrosis of the Lunate from Radiographs Using a Deep-Learning Model

Building a Life Cycle Carbon Emission Estimation Model Based on an Early Design: 68 Case Studies from China

MAGPIE: accurate pathogenic prediction for multiple variant types using machine learning approach

Quantitative Predictive Studies of Multiple Biological Activities of TRPV1 Modulators.

Development and Validation of a Machine Learning Model to Predict Post-dispatch Cancellation of Physician-staffed Rapid Car.

Algorithm Development and Early Performance Evaluation of a Next-Generation Multitarget Stool DNA Screening Test for Colorectal Cancer

Design of some potent non-toxic Autoimmune disorder inhibitors based on 2D-QSAR, CoMFA, molecular docking, and molecular dynamics investigations

2-[18F]FDG PET-based quantification of lymph node metabolic heterogeneity for predicting lymph node metastasis in patients with colorectal cancer.

The potential of near–infrared spectroscopy as a rapid method for quality evaluation of cassava leaves and roots

Integrating machine learning and high throughput screening for the discovery of allosteric AKT1 inhibitors

Abstract B016: AI analysis of histological images accurately identifies luminal subtype urothelial carcinomas characterized by high PPARG expression

Development and validation of a natural language processing system that extracts cognitive test results from clinical notes

Deep learning models for automatic tumor segmentation and total tumor volume assessment in patients with colorectal liver metastases

A fully automated pipeline for the extraction of pectoralis muscle area from chest computed tomography scans.

Toward AI-Assisted Clinical Assessment for Patients with Multiple Myeloma: Feature Selection for Large Language Models

Distinguishing infectivity in patients with pulmonary tuberculosis using deep learning.

A radiomics based approach using adrenal gland and periadrenal fat CT images to allocate COVID-19 health care resources fairly

A novel deep learning model for a coronary computed tomography angiography diagnosis of plaque erosion

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Test Set Validation Research Articles

Related Topics

Articles published on Test Set Validation

Development and validation of outcome prediction model for reperfusion therapy in acute ischemic stroke using nomogram and machine learning.

Development and validation of a multimodal model in predicting severe acute pancreatitis based on radiomics and deep learning

Detecting Avascular Necrosis of the Lunate from Radiographs Using a Deep-Learning Model

Building a Life Cycle Carbon Emission Estimation Model Based on an Early Design: 68 Case Studies from China

MAGPIE: accurate pathogenic prediction for multiple variant types using machine learning approach

Quantitative Predictive Studies of Multiple Biological Activities of TRPV1 Modulators.

Development and Validation of a Machine Learning Model to Predict Post-dispatch Cancellation of Physician-staffed Rapid Car.

Algorithm Development and Early Performance Evaluation of a Next-Generation Multitarget Stool DNA Screening Test for Colorectal Cancer

Design of some potent non-toxic Autoimmune disorder inhibitors based on 2D-QSAR, CoMFA, molecular docking, and molecular dynamics investigations

2-[18F]FDG PET-based quantification of lymph node metabolic heterogeneity for predicting lymph node metastasis in patients with colorectal cancer.

The potential of near–infrared spectroscopy as a rapid method for quality evaluation of cassava leaves and roots

Integrating machine learning and high throughput screening for the discovery of allosteric AKT1 inhibitors

Abstract B016: AI analysis of histological images accurately identifies luminal subtype urothelial carcinomas characterized by high PPARG expression

Development and validation of a natural language processing system that extracts cognitive test results from clinical notes

Deep learning models for automatic tumor segmentation and total tumor volume assessment in patients with colorectal liver metastases

A fully automated pipeline for the extraction of pectoralis muscle area from chest computed tomography scans.

Toward AI-Assisted Clinical Assessment for Patients with Multiple Myeloma: Feature Selection for Large Language Models

Distinguishing infectivity in patients with pulmonary tuberculosis using deep learning.

A radiomics based approach using adrenal gland and periadrenal fat CT images to allocate COVID-19 health care resources fairly

A novel deep learning model for a coronary computed tomography angiography diagnosis of plaque erosion