Abstract

Abstract Insertions and deletions in the transcriptome can potentially have significant impact on function and can be clinically actionable. A number of methods have recently been developed that improve indel detection in DNA by utilizing realignment and/or localized assembly to aid in discovering mutations that are more difficult to detect than single nucleotide polymorphisms. While significant effort has been put into these methods in the context of DNA discovery, RNA-Seq has not received the same attention. For example, the widely used Genome Analysis Toolkit (GATK) recommends removal of splice junctions from all reads prior to proceeding with variant calling methods that are very similar to those used for DNA. We hypothesize that utilizing splice junctions to augment localized assembly across exon-exon boundaries can improve read alignments resulting in improved variant detection in RNA-Seq data. We have developed an update to the Assembly Based ReAligner (ABRA2) that makes use of splice junction information to aid in realignment of reads. We assessed indel detection performance using the BEERs RNA-Seq simulator. Two million simulated reads of length 100nt were generated across 1000 human genes. In order to test more complex indel detection, the simulator was modified to generate an even distribution of 1391 indels of length 1 to 100. The simulated reads were aligned using STAR and subsequently realigned using ABRA2. Freebayes and GATK were run against the non-realigned STAR output resulting in indel detection sensitivity of 15% and 18% with precision of 97% and 88% respectively. Freebayes was then run against the ABRA2 realignments resulting in an approximately 4 fold increase in indel detection sensitivity of 67% while maintaining precision of 97%. In frame deletions in Epidermal Growth Factor Receptor (EGFR) have oncogenic potential, can be indicators for Gefitinib or Erlotinib treatment, and are frequently detected in lung cancer via DNA sequencing. Among 514 TCGA Lung Adenocarcinoma RNA-Seq samples, ABRA2 revealed 73 in frame coding indels in EGFR ranging in length from 3 to 24 bases, including 58 deletions in exon 19 of length 15 bases. This represents a 3 fold increase relative to what was initially reported from DNA by TCGA. Further, in the absence of ABRA2, only 5 in frame coding indels are detected in RNA, the largest of which is a 9 base deletion. We have presented here ABRA2, a new version of the Assembly Based ReAligner that is capable of accurately realigning RNA-Seq reads containing variations that are currently not well handled by widely used aligners and variant callers, thus improving accuracy of variant detection in RNA-Seq. Citation Format: Lisle E. Mose, D. Neil Hayes, Charles M. Perou, Joel S. Parker. Improved indel detection in RNA-seq data via assembly based re-alignment reveals expressed Epidermal Growth Factor Receptor indels in Lung Adenocarcinoma [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2017; 2017 Apr 1-5; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2017;77(13 Suppl):Abstract nr 3592. doi:10.1158/1538-7445.AM2017-3592

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.