Batch Correction Methods Research Articles

Abstract Purpose: Develop a shiny application to help integrate cancer datasets and guide researchers in selecting an appropriate method of correction for their technical artifacts. Description: Integrative analysis of heterogeneous expression data remains challenging due to variations in platform, RNA quality, sample processing, and other unknown technical effects. As the field performs omics profiling of samples from cancer patients and murine models, there is the need for harmonizing, identifying, and correcting these technical effects to ensure robust analysis on the treatment and conditional effects of the underlying genetics or biological events. However, selecting and implementing different approaches for removing unwanted batch effects can be a time-consuming and tedious process, especially for more biologically focused investigators. In this project, we present Shiny BATCH-FLEX, a Shiny app to rapidly visualize batch correction by established batch correction methods such as ComBat, Mean Centering, ComBatSeq, and Limma RemoveBatchEffect. With BATCH-FLEX, users can visualize the contribution of variance of a factor before and after correction using principal component analysis, relative log expression plots, heatmaps, and explanatory variables. Users can also save all plots and matrices as a single ZIP file for further downstream analysis. Results: As a proof of concept, we assessed BATCH-FLEX using simulated data generated from a linear model framework introduced by Gagnon-Bartsch and Speed, which assumes that gene expression measurements can be distilled to a combination of the biological signal, systemic nose, and random noise. BATCH-FLEX was able to successfully identify and remove the introduced effect using each of the batch correction methods listed above. Next, we evaluated BATCH-FLEX using a comprehensive collection of bladder cancer data consisting of microarray data from 13 studies spanning 1452 samples. Following the cleaning of study-dependent noise, BATCH-FLEX was successful in revealing the heterogeneity among bladder cancer based on known sample type annotations. Conclusion: We have developed BATCH-FLEX, a tool for oncologic researchers to rapidly assess, select, and implement commonly used batch correction methods. This tool is available at https://github.com/shawlab-moffitt/BATCH-FLEX. Our integrative web portal of a Bladder Cancer Resource for Translational Science (BEACON) will also be shared at the meeting. Citation Format: Joshua Davis, Alyssa Obermayer, Thac Duong, Rebecca Hesterberg, Xuefeng Wang, Mingxiang Teng, G. Daniel Grass, Timothy Shaw. BATCH-FLEX: Feature-level equalization of x-batch in heterogeneous cancer data [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 7423.

Read full abstract

Abstract Gene expression profiling is widely used in oncology research and in clinical settings for decision making. Despite the cross-platform correlation of gene expression values, ideally, each measurement should be evaluated against a cohort of samples sequenced using the same methodology. Clinical samples, preserved as FFPEs, often undergo exome capture-based RNA-seq; research samples, stored as fresh/frozen (FF), undergo poly-A RNA-seq, producing high quality expression data. Thus, development of sequencing protocols and data processing algorithms are necessary to provide the same quality gene expression measurements from FFPE samples. Further, while several batch effect correction algorithms exist to neutralize the batch effect between samples across large cohorts, the majority cannot be applied to an individual sample, raising the need to develop an algorithm for single sample projection to improve gene expression-based personalized clinical decision-making. To improve the quality of RNA reads from FFPE tissues, exome capture enrichment of RNA transcripts was optimized and the concordance with poly-A RNA-seq was increased by adding non coding 3’ and 5’ UTR region to the probes. After testing the performance of multiple different extraction methods, a 0.88 correlation was achieved between exome-capture-based and poly-A RNA-seq protocols. To further align the sequencing methodologies, we designed a batch-correction ML-based algorithm by performing a series of paired RNA-seq experiments from the same sample using exome-capture-based and poly-A RNA-seq; we applied linear modeling on the training subset (N = 64) and verified the performance on the validation subset (N = 24). For each gene, 5-20 correlated genes belonging to the TCGA combined pan-cancer datasets were selected and trained using the Lasso model. Over 82% of genes (total N = 20,062) correlated across the two RNA-seq methodologies for each sample after correction (ccc value &gt; 0.5), and approximately 94% of cancer-specific and microenvironment-related genes correlated (ccc value &gt; 0.5). The algorithm significantly outperformed other batch correction methods, with ccc values &gt; 0.8 for 51.37% of the 20,062 genes compared with ~3% for PCA, 26% for MNN, and 28% for ComBat. Our algorithm showed improved performance by correction of 77% of the 1,890 clinically-relevant genes (ccc values &gt; 0.8) compared with 15% for PCA, 39% for MNN, and 40% for ComBat. Here, we developed combinatory technology with a batch correction algorithm trained and developed on FFPE or FF tumor samples, using exome capture-based sequencing or poly-A RNA-seq, that enables the projection of a single sample onto a larger cohort. Future application of this correction tool will enable direct analysis of gene expression of single tumor samples to support potential gene expression-based treatment decisions. Citation Format: Nikita Kotlov, Kirill Shaposhnikov, Cagdas Tazearslan, Ilya Cheremushkin, Madison Chasse, Artur Baisangurov, Svetlana Podsvirova, Svetlana Korkova, Yaroslav Lozinsky, Katerina Nuzhdina, Elena Vasileva, Dmitry Kravchenko, Krystle Nomie, John Curran, Nathan Fowler, Alexander Bagaev. Combinatory technologies for single sample gene expression projection onto a cohort sequenced with a different technology for personalized clinical decision-making [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr 1216.

Read full abstract

Batch Correction Methods Research Articles

Related Topics

Articles published on Batch Correction Methods

Batch effects correction in scRNA-seq based on biological-noise decoupling autoencoder and central-cross loss

Evaluating batch correction methods for image-based cell profiling

Evaluation of normalization methods for predicting quantitative phenotypes in metagenomic data analysis.

Procrustes is a machine-learning approach that removes cross-platform batch effects from clinical RNA sequencing data

Comparison of the effectiveness of different normalization methods for metagenomic cross-study phenotype prediction under heterogeneity

Abstract 7423: BATCH-FLEX: Feature-level equalization of x-batch in heterogeneous cancer data

Label-aware distance mitigates temporal and spatial variability for clustering and visualization of single-cell gene expression data

A Joint Batch Correction and Adaptive Clustering Method of Single-Cell Transcriptomic Data

Comparative analysis of batch correction methods for FDG PET/CT using metabolic radiogenomic data of lung cancer patients

Exploratory optimisation of a LC-HRMS based analytical method for untargeted metabolomic screening of Cannabis Sativa L. through Data Mining

Inferring single-cell transcriptomic dynamics with structured latent gene expression dynamics

MalbacR: A Package for Standardized Implementation of Batch Correction Methods for Omics Data.

Single cell transcriptome sequencing of stimulated and frozen human peripheral blood mononuclear cells

Integrating Multiple Single-Cell RNA Sequencing Datasets Using Adversarial Autoencoders.

CLAIRE: contrastive learning-based batch correction framework for better balance between batch mixing and preservation of cellular heterogeneity.

Batch correction methods for nontarget chemical analysis data: application to a municipal wastewater collection system

SelectBCM tool: a batch evaluation framework to select the most appropriate batch-correction methods for bulk transcriptome analysis.

Abstract 1216: Combinatory technologies for single sample gene expression projection onto a cohort sequenced with a different technology for personalized clinical decision-making

IMGG: Integrating Multiple Single-Cell Datasets through Connected Graphs and Generative Adversarial Networks.

AMDBNorm: an approach based on distribution adjustment to eliminate batch effects of gene expression data.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Batch Correction Methods Research Articles

Related Topics

Articles published on Batch Correction Methods

Batch effects correction in scRNA-seq based on biological-noise decoupling autoencoder and central-cross loss

Evaluating batch correction methods for image-based cell profiling

Evaluation of normalization methods for predicting quantitative phenotypes in metagenomic data analysis.

Procrustes is a machine-learning approach that removes cross-platform batch effects from clinical RNA sequencing data

Comparison of the effectiveness of different normalization methods for metagenomic cross-study phenotype prediction under heterogeneity

Abstract 7423: BATCH-FLEX: Feature-level equalization of x-batch in heterogeneous cancer data

Label-aware distance mitigates temporal and spatial variability for clustering and visualization of single-cell gene expression data

A Joint Batch Correction and Adaptive Clustering Method of Single-Cell Transcriptomic Data

Comparative analysis of batch correction methods for FDG PET/CT using metabolic radiogenomic data of lung cancer patients

Exploratory optimisation of a LC-HRMS based analytical method for untargeted metabolomic screening of Cannabis Sativa L. through Data Mining

Inferring single-cell transcriptomic dynamics with structured latent gene expression dynamics

MalbacR: A Package for Standardized Implementation of Batch Correction Methods for Omics Data.

Single cell transcriptome sequencing of stimulated and frozen human peripheral blood mononuclear cells

Integrating Multiple Single-Cell RNA Sequencing Datasets Using Adversarial Autoencoders.

CLAIRE: contrastive learning-based batch correction framework for better balance between batch mixing and preservation of cellular heterogeneity.

Batch correction methods for nontarget chemical analysis data: application to a municipal wastewater collection system

SelectBCM tool: a batch evaluation framework to select the most appropriate batch-correction methods for bulk transcriptome analysis.

Abstract 1216: Combinatory technologies for single sample gene expression projection onto a cohort sequenced with a different technology for personalized clinical decision-making

IMGG: Integrating Multiple Single-Cell Datasets through Connected Graphs and Generative Adversarial Networks.

AMDBNorm: an approach based on distribution adjustment to eliminate batch effects of gene expression data.