Cross-dataset Analysis Research Articles

Abstract Predictive modeling holds great promise for improving personalized cancer treatment and efficiency of drug development. In recent years, deep learning (DL) has been extensively explored for drug response prediction (DRP), outperforming classical machine learning in prediction generalization to new data. Despite the considerable interest in DRP, no agreed-upon methodology for evaluating and comparing the diverse DL models yet exists. Existing papers generally demonstrate the performance of proposed models using cross-validation within a single cell line dataset and compare with baseline models of their choice, substantially limiting the scope and validity of model evaluation and comparison. In this work, we investigate the ability of DRP models for generalizing predictions across datasets of multiple drug screening studies, a more challenging scenario mimicking practical applications of DRP models. Five cell line datasets and six community DRP models with advanced DL architectures have been explored. Public cell line drug screening datasets have been curated and processed for this analysis, including CCLE, CTRP, GDSC1, GDSC2, and GCSI. For each dataset, the same preprocessing pipeline was used to generate cell line gene expressions, drug representations, and drug response values. The six DRP models include advanced architectures and feature engineering methods such as transformer, graph neural network, and image representation of tabular data. Systematic model curation and training have been applied, including consistent training and testing data splits across models and hyperparameter optimization (HPO). To cope with the large-scale model training and HPO, automatic workflows have been implemented and executed on high-performance computing systems. A 5-by-5 matrix of prediction scores, corresponding to the five datasets in both row and column dimensions, has been generated for each model, with off-diagonal values representing the cross-dataset generalization. Despite the advanced DL techniques, all models exhibit substantially inferior performance in cross-dataset analysis as compared with cross-validation within a single dataset. This result demonstrates the challenge of cross-dataset generalization for DRP and motivates the need for rigorous and systematic evaluation of DRP models, which simulates real-world applications. Citation Format: Alexander Partin, Thomas S. Brettin, Yitan Zhu, Jamie Overbeek, Oleksandr Narykov, Priyanka Vasanthakumari, Austin Clyde, Sara E. Jones, Satishkumar Ranganathan Ganakammal, Justin M. Wozniak, Andreas Wilke, Jamaludin Mohd-Yusof, Michael R. Weil, Alexander T. Pearson, Rick L. Stevens. Systematic evaluation and comparison of drug response prediction models: a case study of prediction generalization across cell lines datasets. [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 5380.

Read full abstract

Abstract Abstract #2036 Genome scale analysis by microarrays or array CGH holds the promise to identify novel subclasses of breast cancer or characterize functional modules of clinical relevance in a robust fashion. However, the success of these endeavors is highly dependent on several factors such as the general noise structure and noise level of high throughput measurements or the strength of association between biomarkers/gene modules and clinical outcome. We will address two issues deeply rooted in the high throughput nature of genome scale profiling, which are relevant for the accurate analysis of clinical microarray data sets: systematic bias in clinical microarray data and establishing strategies that allow extracting robust, convergent information/classification from genome scale molecular profiling of breast cancer.&#x2028; As will be demonstrated, clinical microarray data are burdened with a high level of systematic bias. We identified sources of technical bias affecting many genes in concert, thus causing spurious correlations in clinical data sets and false associations between genes and clinical variables. A method will be presented to correct for technical bias in clinical microarray data, which increased concordance with known biological relationships in multiple data sets.&#x2028; Gene expression profiling based classification of breast cancer and prognostic or clinical response associated gene expression signatures are usually derived from a single data set. However, any result extracted from a single data set will reflect to a large extent the technical (which genes are measured reliably on the microarray) and biological (such as cohort selection) bias of the given cohort. An alternative approach uses multiple, in this case 5 different, analogous clinical data sets and determines the robust, convergent information emerging from the cross data-set analysis. We will present such a method, which will reduce the impact of data set specific bias and outline robust functional modules in breast cancer, ultimately leading to the reevaluation of gene expression profiling based subtyping and diagnostic gene expression signatures in breast cancer.&#x2028; Finally, we will present evidence for the existence of at least two fundamentally different types of genome instability in breast cancer with direct implications for response to chemotherapy. Citation Information: Cancer Res 2009;69(2 Suppl):Abstract nr 2036.

Read full abstract

Cross-dataset Analysis Research Articles

Related Topics

Articles published on Cross-dataset Analysis

Indigeneity in Context—Evolving Maya Ch’orti’ Notions of Cultural Identity: A Qualitative Study From 1993 to 2019

Evaluation of Normalization Algorithms for Breast Mammogram Mass Segmentation

Markov-based Neural Networks for Heart Sound Segmentation: Using domain knowledge in a principled way.

Towards Equitable Healthcare: A Cross Dataset Analysis of Healthcare and Telehealth Access

Abstract 5380: Systematic evaluation and comparison of drug response prediction models: a case study of prediction generalization across cell lines datasets

Cross Dataset Analysis for Generalizability of HRV-Based Stress Detection Models.

The value of cross-data set analysis for automobile insurance fraud detection

ViolenceNet: Dense Multi-Head Self-Attention with Bidirectional Convolutional LSTM for Detecting Violence

Resting-state functional connectivity of the human hippocampus in periadolescent children: Associations with age and memory performance.

The FaceChannel: A Fast and Furious Deep Neural Network for Facial Expression Recognition

Exploring Integrative Analysis Using the BioMedical Evidence Graph.

COVID-19 detection in CT images with deep learning: A voting-based scheme and cross-datasets analysis.

Toward Unbiased Facial Expression Recognition in the Wild via Cross-Dataset Adaptation

A multitask clustering approach for single-cell RNA-seq analysis in Recessive Dystrophic Epidermolysis Bullosa.

An integrative analysis of regional gene expression profiles in the human brain

The CARMEN data sharing portal project: what have we learned?

Points of Interest and Visual Dictionaries for Automatic Retinal Lesion Detection

DiseaseMeth: a human disease methylation database

Identification of robust, clinically relevant phenotypes of breast cancer from genome scale molecular profiling.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Cross-dataset Analysis Research Articles

Related Topics

Articles published on Cross-dataset Analysis

Indigeneity in Context—Evolving Maya Ch’orti’ Notions of Cultural Identity: A Qualitative Study From 1993 to 2019

Evaluation of Normalization Algorithms for Breast Mammogram Mass Segmentation

Markov-based Neural Networks for Heart Sound Segmentation: Using domain knowledge in a principled way.

Towards Equitable Healthcare: A Cross Dataset Analysis of Healthcare and Telehealth Access

Abstract 5380: Systematic evaluation and comparison of drug response prediction models: a case study of prediction generalization across cell lines datasets

Cross Dataset Analysis for Generalizability of HRV-Based Stress Detection Models.

The value of cross-data set analysis for automobile insurance fraud detection

ViolenceNet: Dense Multi-Head Self-Attention with Bidirectional Convolutional LSTM for Detecting Violence

Resting-state functional connectivity of the human hippocampus in periadolescent children: Associations with age and memory performance.

The FaceChannel: A Fast and Furious Deep Neural Network for Facial Expression Recognition

Exploring Integrative Analysis Using the BioMedical Evidence Graph.

COVID-19 detection in CT images with deep learning: A voting-based scheme and cross-datasets analysis.

Toward Unbiased Facial Expression Recognition in the Wild via Cross-Dataset Adaptation

A multitask clustering approach for single-cell RNA-seq analysis in Recessive Dystrophic Epidermolysis Bullosa.

An integrative analysis of regional gene expression profiles in the human brain

The CARMEN data sharing portal project: what have we learned?

Points of Interest and Visual Dictionaries for Automatic Retinal Lesion Detection

DiseaseMeth: a human disease methylation database

Identification of robust, clinically relevant phenotypes of breast cancer from genome scale molecular profiling.