Omics Measurements Research Articles

Abstract Background: The StepIdent study aims to develop a gene signature predicting metastasis in patients with cutaneous squamous cell carcinoma (cSCC) to improve risk stratification, thus enabling personalized decisions about follow-up schedules and treatment options. Here we describe the unique characteristics, challenges, and best practices for an efficient design of a discovery cohort for a rare outcome (metastasis prevalence: 2-5%); for retrieving, curating, and linking the clinical and pathological data through nationwide databases; and for measuring gene expression through sequencing of archived Formalin-Fixed Paraffin-Embedded (FFPE) primary tumor samples. Methods: Following a predefined protocol, we identified a nested-case control cohort (NCC) of 305 cases and 305 controls from a nationwide cohort of 19,120 patients with a first cSCC in the Netherlands from 2007 to 2009, followed up until 2020. We chose an NCC design since it is an efficient study design in a rare outcome setting (weighting is needed to accommodate the under-sampling of the controls). Patients were identified from the Dutch National Cancer Registry (NCR) and the clinical information was retrieved from the NCR which is linked to the nationwide registry of histo- and cytopathology (PALGA). Tumor blocks were requested from PALGA, and pathological characteristics were assessed by dermatopathologists. We matched controls to cases, based on a risk score estimated by a clinicopathological model. Gene expression was measured using the Illumina RNA Prep with Enrichment kit combined with the whole exome panel and paired-end sequenced on the NextSeq 550. Results: Tissue slides for 541 samples were retrieved for sequencing. 151 samples were excluded after pathology review or due to low pre-library concentration. The final cohort includes 195 case-control pairs (n=390). The median sequencing depth was 43M (Q1-Q3: 35-52M); the median Q30 was 85% (Q1-Q3: 83-87%); the median GC content was 51% (Q1-Q3: 50-52%); a median of 1.8% of base pairs (Q1-Q3: 1.4-2.1%) was trimmed prior to the mapping/alignment; a median of 69% (Q1-Q3: 65-74%) of reads were aligned as protein-coding and a median of 7% (Q1-Q3: 6-10%) as rRNA; a median of 95% (Q1-Q3: 93-96%) of reads were aligned by STAR. Two samples were excluded based on quality control. Conclusion: We described an efficient design and implementation of a nationwide discovery study in cSCC, involving the retrieval of clinicopathological data, the collection of FFPE materials, and the execution of omics measurements. This study presents the largest cohort to date, incorporating omics measurements of primary cSCC samples, combined with simultaneous access to well-curated clinical and pathological information and follow-up data. Our findings can provide guidance for similar studies involving a rare clinical endpoint, where an efficient study design is a necessity. Citation Format: Barbara Rentroia-Pacheco, Lara Pozza, Yan Ting Chen, Daphne Huigh, Celeste J. Eggermont, Olivia FM Steijlen, Sheril Alex, Jvalini Dwarkasing, Domenico Bellomo, Harmen JG van de Werken, Antien L. Mooyaart, Marlies Wakkee, Loes M. Hollestein. Efficient study design for the discovery of a gene expression signature predicting metastasis in cutaneous squamous cell carcinoma [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 4869.

Read full abstract

Abstract Despite large-scale efforts to measure the effect of drug screens in cancer cell lines, mapping the effects of drugs to patient samples has been a challenge. Biological differences between cell lines and patients, such as lack of immune system or microbiome, in-vitro survival adaptations, and biases in measurement technologies create differences across sample modalities that can confound analysis including prediction with machine learning. In this work, we propose a multiway batch correction strategy to enable algorithmic prediction of tumor drug response across model systems and patient data.Recent advances in batch correction algorithms have been motivated by the need to correct for batch effects in single-cell omics and include diverse approaches such as variational autoencoders (VAEs) and generative adversarial networks (GANs). Given the successes of these generative deep learning methods in single cell sequencing analysis, we worked to employ similar approaches to correct large omics measurements across various cancer datasets. Here, we describe mapping of datasets from diverse data sources and model systems to the same space, so that a predictive model of drug response built in a system such as cell lines can be used in biologically relevant models such as organoids, patient derived xenografts, and tumor data. Specifically, we introduce a modified loss function in a VAE using cosine similarity distance to minimize the effect of different cancer model systems in predicting cancer types. We evaluate the method on standard data types for drug response prediction - gene expression, copy number variation, and protein abundance. For this method, the cosine similarity is added as an additional term to the VAE reconstruction and Kullback-Leibler divergence loss terms. This injects a quantification of the dissimilarity between the tumor and tumor model distributions into the backpropagation and gradient descent for updating the model parameters resulting in an encoded representation of the data where the effect of data source has been attenuated while preserving the phenotypic signal. We evaluate our approach for biological signal preservation while reducing model system-specific noise with logistic regression and Euclidean distance. Our results show that the proposed VAE can effectively correct for platform effects and improve the accuracy of downstream integrative analyses. This study has the potential to improve the accuracy and translatability of proteogenomic drug response studies. The proposed modified VAE could be used to correct for platform effects in a variety of datasets, including those from different studies, different platforms, and different cancer types. This could lead to new insights into cancer biology, calibration of cancer patient digital twins, and the development of new diagnostic and therapeutic strategies. Citation Format: Brian Karlberg, Raphael Kirchgaessner, Jeremy R. Jacobson, Kyle Ellrott, Sara J. Gosline. Tumor model to tumor treatment: Applying deep learning approaches to map multimodal data from cancer model systems to patients [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 7393.

Read full abstract

Omics Measurements Research Articles

Related Topics

Articles published on Omics Measurements

DeePathNet: a transformer-based deep learning model integrating multi-omic data with cancer pathways.

The Addition of Transcriptomics to the Bead-Enabled Accelerated Monophasic Multi-Omics Method: A Step toward Universal Sample Preparation.

Bootstrap Evaluation of Association Matrices (BEAM) for Integrating Multiple Omics Profiles with Multiple Outcomes.

Deciphering spatial domains from spatial multi-omics with SpatialGlue

#894 Dynamic multi-omics and mechanistic modelling approach provides deeper insight into kidney fibrosis progression

Preclinical side effect prediction through pathway engineering of protein interaction network models.

Abstract 4869: Efficient study design for the discovery of a gene expression signature predicting metastasis in cutaneous squamous cell carcinoma

Abstract 7393: Tumor model to tumor treatment: Applying deep learning approaches to map multimodal data from cancer model systems to patients

Benchmarking feature selection and feature extraction methods to improve the performances of machine-learning algorithms for patient classification using metabolomics biomedical data

Functional and multi-omics signatures of mitapivat efficacy upon activation of pyruvate kinase in red blood cells from patients with sickle cell disease.

The Effect of 8-Week Protein Supplementation with a Simple Exercise Program on Body Composition, Muscle Strength, and Amino Acid OMICS among Healthy Sedentary Indians: A Randomized, Double-Blind, Placebo-Controlled Trial.

A machine‐learning approach to biomarker evaluation for AD precision medicine

Multi-omic prediction of incident type 2 diabetes

Sample Preparation Method for MALDI Mass Spectrometry Imaging of Fresh-Frozen Spines.

Evaluation of input data modality choices on functional gene embeddings

Multi-omics characterization of NIST seafood reference materials and alternative matrix preparations.

Microbiability of milk composition and genetic control of microbiota effects in sheep

Aligned deep neural network for integrative analysis with high-dimensional input

Single-cell technologies for multimodal omics measurements

Multiomic signatures of body mass index identify heterogeneous health phenotypes and responses to a lifestyle intervention

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Omics Measurements Research Articles

Related Topics

Articles published on Omics Measurements

DeePathNet: a transformer-based deep learning model integrating multi-omic data with cancer pathways.

The Addition of Transcriptomics to the Bead-Enabled Accelerated Monophasic Multi-Omics Method: A Step toward Universal Sample Preparation.

Bootstrap Evaluation of Association Matrices (BEAM) for Integrating Multiple Omics Profiles with Multiple Outcomes.

Deciphering spatial domains from spatial multi-omics with SpatialGlue

#894 Dynamic multi-omics and mechanistic modelling approach provides deeper insight into kidney fibrosis progression

Preclinical side effect prediction through pathway engineering of protein interaction network models.

Abstract 4869: Efficient study design for the discovery of a gene expression signature predicting metastasis in cutaneous squamous cell carcinoma

Abstract 7393: Tumor model to tumor treatment: Applying deep learning approaches to map multimodal data from cancer model systems to patients

Benchmarking feature selection and feature extraction methods to improve the performances of machine-learning algorithms for patient classification using metabolomics biomedical data

Functional and multi-omics signatures of mitapivat efficacy upon activation of pyruvate kinase in red blood cells from patients with sickle cell disease.

The Effect of 8-Week Protein Supplementation with a Simple Exercise Program on Body Composition, Muscle Strength, and Amino Acid OMICS among Healthy Sedentary Indians: A Randomized, Double-Blind, Placebo-Controlled Trial.

A machine‐learning approach to biomarker evaluation for AD precision medicine

Multi-omic prediction of incident type 2 diabetes

Sample Preparation Method for MALDI Mass Spectrometry Imaging of Fresh-Frozen Spines.

Evaluation of input data modality choices on functional gene embeddings

Multi-omics characterization of NIST seafood reference materials and alternative matrix preparations.

Microbiability of milk composition and genetic control of microbiota effects in sheep

Aligned deep neural network for integrative analysis with high-dimensional input

Single-cell technologies for multimodal omics measurements

Multiomic signatures of body mass index identify heterogeneous health phenotypes and responses to a lifestyle intervention