Abstract High-throughput DNA sequencing technologies have enabled unbiased screening of genomic alterations such as single nucleotide variants (SNVs) and small insertions/deletions (indels). However, variant analysis using transcriptomic sequencing (RNA-seq) data has not become standard due to challenges in distinguishing true variants, in particular indels, from artifacts that can arise from RNA-seq mapping and library preparation. We have previously developed a tool, RNAIndel, which classifies indels as somatic, germline, or artifacts using a machine learning-based model. Here, we present enhanced somatic indel discovery by incorporating tumor RNA-Seq data with conventional paired tumor-normal DNA-Seq by re-analyzing the Cancer Genome Atlas (TCGA) data using RNAIndel. Our analytic process involves running RNAIndel on 9,101 TCGA tumor RNA-Seq samples across 33 cancer types hosted on Cancer Genomics Cloud by Seven Bridges Genomics (https://www.cancergenomicscloud.org) and comparing the result with the reference somatic indel set generated from paired tumor/normal whole-exome sequencing (WES) data using an ensemble caller developed by NCI Genomic Data Commons (GDC). The comparison validates the RNAIndel variants by matching to the corresponding WES dataset. We also use the GDC indel set to test their expression in RNA-Seq. Indel alignments across sequencing platforms were compared by indelPost, an algorithm that we developed, to perform indel realignment to overcome the differences in mapping algorithms (BWA vs. STAR) and read lengths (100-bp vs. 50-bp) in WES versus RNA-Seq data.The joint DNA/RNA indel analysis provides important insights into the expression profile of indels in known cancer driver genes versus non-drivers. First, indels in driver genes are more likely expressed than those in non-drivers (79% vs. 50%, p < 2.2 × 10-16). Second, mutant allele expression is frequently upregulated for truncating indels in tumor-suppressor genes (TSG) as allelic imbalance was detected in 6% of such variants in TSGs vs. 2% non-drivers. This may indicate a potential second hit leading to loss of the wild-type allele of in the TSGs. Third, allelic imbalance for in-frame indels was also frequent for oncogenes (20%) compared with the non-drivers (7%). The high expression likelihood of driver genes and the enriched indel allele expression for TSGs and oncogenes both support the use of RNA-Seq to enhance driver indel discovery. Indeed, our analysis reveals ~10% additional driver indels by RNA-Seq analysis which are absent from the GDC reference indel set. These indels affect prominent driver genes, most frequently CDKN2A, TP53 and ARID1A. Our experience argues for incorporating de novo indel analysis in RNA-Seq as a standard approach for future cancer genomic analysis. Citation Format: Kohei Hagiwara, Andrew Thrasher, Jinghui Zhang. Driver indel discovery and allelic imbalance in >9,000 tumor RNA-Seq samples from the Cancer Genome Atlas (TCGA) [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 6252.
Read full abstract