Abstract As the time required to sequence a genome using NGS sequencers is decreasing rapidly, the use of genomic data in the clinic to inform patient care, is becoming feasible. One area that needs to be addressed is the computational time required to align reads to a reference genome and call variants, which is now a larger proportion of the overall process. Standardization and repeatability of processing is critical in a clinical setting. To satisfy these criteria, we have developed a bioinformatics pipeline which starts with deep targeted resequencing from gene sets and related loci, and then automates the alignment, variant calling and functional annotation using an open source workflow tool. Given that tumor samples are often heterogenous mixtures - indicating either subclonal populations or fractions of normal tissue, deep sequencing is crucial in accurate detection of less prevalent minor allele somatic mutations. We used Oligonucleotide-Selective Sequencing (OS-Seq) as a targeted resequencing approach to provide deep coverage over the target region, with high specificity. With OS-Seq, the inner surface of an Illumina flow cell is modified with primer probe oligos, turning it into an enrichment platform for DNA sequencing libraries. The required steps from modification to a ready to sequence flow cell are fully automated and integrated into the standard Illumina sequencing preparation using cBot. The computational workflow for this project was developed using the open source tool bpipe. Once a library of bioinformatics steps has been defined in bpipe, these steps can be used as building blocks and put together in various ways to develop different pipelines. Bpipe also provides comprehensive audit trails and logs so that results are traceable and repeatable. For this particular workflow, steps for alignment to the genome using Burrows-Wheeler Aligner (BWA), steps for differentiation and filtering for each OS-Seq primer probe; variant calling using muTect; and variant filtering and annotation using custom scripts which access resources such as Single Nucleotide Polymorphisms Database (dbSNP), Combined Annotation Dependent Depletion (CADD), Catalogue of Somatic Mutations in Cancer (COSMIC), The Cancer Genome Atlas (TCGA), Drug Gene Interaction Database (DGIdb), are included. As a pilot for a larger study on the underlying genetics of Gastric cancer (GC [MIM 137215]) we analyzed matched tumor/normal samples from ten GC patients (6 intestinal, 4 diffuse) using this automated pipeline. We targeted the exons of 74 driver genes involved in several pathways commonly implicated in gastrointestinal cancers, such as RTK-RAS signaling. A median sequencing depth of 1300-2700 was achieved over these 20 samples, resulting in an average of four high-confidence somatic mutations in the target genes per patient, predicted to be deleterious by CADD. All variants were fully annotated with amino acid change, CADD score, and any match to mutations curated in COSMIC or reported in TCGA, plus possible drug targets as reported in DGIdb. By reducing the data processing turnaround time required to provide clinically actionable mutations, this study has significant implications in the development of precision cancer medicine. Our open source bioinformatics analysis approach provides mutations in a format readily accessible to the physician, and which can inform patient care. Citation Format: Susan M. Grimes, HoJoon Lee, Stephanie Greer, Jae-Ho Cheong, Hanlee P. Ji. Automated pipeline for high confidence variant calling and functional annotation, for matched tumor/normal samples sequenced by next-generation sequencing (NGS). [abstract]. In: Proceedings of the AACR Special Conference on Translation of the Cancer Genome; Feb 7-9, 2015; San Francisco, CA. Philadelphia (PA): AACR; Cancer Res 2015;75(22 Suppl 1):Abstract nr A1-41.
Read full abstract