Abstract

Abstract Cancer genome sequencing is a widely adopted approach for identifying critical genetic variants and increasingly, for diagnostic application in precision medicine. Accurate analysis of cancer genome sequencing is essential, yet remains challenging - the reliable and accurate identification of somatic variants continues to be a significant issue. One source of issues involves the mapping/alignment of sequences. It relies entirely on the human reference genome, a static data consisting of character strings of chromosome. However, the current reference is derived from few individuals and thus, does not account for the full diversity or complex structure of human genomes. Critical features of cancer genomes are missed or called with errors. As a solution, we developed dynamic genomic indexing (DGI), which identifies somatic variants by using the frequency of short sequences that are 20∼200 bases in length (20∼200mer) from paired tumor/normal sequencing data. Any somatic variants will affect the frequency of short sequences. In this study, DIG relies on the presence of neo-20mers, derived by point mutations and insertions/deletions (indels), that are different in frequency between normal and tumor sequence reads. These 20mer are unique sequences that arise from cancer genetic variants. First, the DGI identifies the four groups of differential 20mers that their frequencies are significantly different between tumor and normal; tumor specific (TS), normal specific (NS), tumor dominant (TD), and normal dominant (ND). Second, DIG clusters differential 20mers with neighbor 20mers sharing 19 bp in the middle within each group. Third, clustered tumor specific 20mers are matched to clustered NS or ND 20mers. The number of clustered 20mers represents the number of mutational events and ratio between frequency of TS 20mers and matched ND 20mers implies allelic depth. All calculations are conducted in comparison to matched normal samples. Thus, the reference genome is never required throughout the analysis process. Furthermore, these 20mers can be searched easily in RNA-Seq to see if mutations are expressed. We evaluated the performance of DGI using simulated whole exome sequencing (WES) data and 24 WES from the Cancer Genome Atlas project. We identified many of reported somatic mutations (>90%) from WES as well as novel mutations. We observed DGI performs as good as other variant callers for substitutions while outperforms for indels. Our results support the principle of non-alignment sequencing analysis. DGI provides simple, but accurate matrices of somatic variants for cancer sequencing data for any population and can be easily scaled to handle population-based studies. This study provides small variant detection (<50bp) as a first step toward in a broader effort to develop non-alignment sequencing analysis for structural variation such as copy number and chromosomal rearrangement. Citation Format: HoJoon Lee, Jacob J. Parker, John Bell, Hanlee P. Ji. Dynamic genomic indexing enables accurate somatic variant detection from cancer genome sequencing without sequence alignment limitations. [abstract]. In: Proceedings of the 107th Annual Meeting of the American Association for Cancer Research; 2016 Apr 16-20; New Orleans, LA. Philadelphia (PA): AACR; Cancer Res 2016;76(14 Suppl):Abstract nr 5288.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call