Abstract

Abstract The use of massively parallel sequencing data (MPS) is a common technique for identifying genetic alterations in disease. A number of tools and resources have been developed to assist in variant identification, but most are not specifically designed for somatic cancer studies. Tumor samples often suffer from admixture and heterogeneity, and tumor DNA often is derived from formalin fixed paraffin embedded samples rather than fresh samples, which introduces artificial bias. These confounders require adjustments to variant quality filtering to ensure a confident call set, but no universal guidelines exist for bioinformatics analyses. In addition, no single tool characterizes the spectrum of possible DNA-based alterations in tumors, specifically single nucleotide variants, small to medium sized indels, large genomic rearrangements and copy number alterations. To address these issues, we compared several methods of bioinformatic analysis using whole exome and targeted high-throughput sequencing data from various tumor and tumor-derived cell line studies. Using a data sets with known somatic alterations, we developed modifications to best practices guidelines for the publically available Genome Analysis Toolkit (GATK) for use in tumor and cell line variant calling. The haplotype score annotation was removed from all filtering consideration due to its expectance of no more than two segregating haplotypes at a given site, which, for example, caused all known BRAF V600 mutations to be removed. Scores from the read position rank sum test required a reduced filter threshold due to presence of high alternate allele fractions, which may contribute an artificial bias for this annotation value. Where applicable, we combined GATK results with calls from MuTect to increase specificity of the somatic changes identified as well as further increase sensitivity to singleton mutations. Finally, we have analyzed three methods to detect large genomic rearrangements and copy number changes (Pindel, ngCGH, and Sequenza) from MPS data, and found inconsistent results which require further scrutiny. In conclusion, we have developed a somatic variant calling pipeline which includes several modifications to the best practices in standard variant calling algorithms to enhance the sensitivity and specificity for somatic variant calling of the entire range of DNA-based alterations in tumors. Citation Format: Bradley Wubbenhorst. Improving bioinformatic analysis pipelines for calling somatic mutations in tumors and tumor derived cell lines. [abstract]. In: Proceedings of the 106th Annual Meeting of the American Association for Cancer Research; 2015 Apr 18-22; Philadelphia, PA. Philadelphia (PA): AACR; Cancer Res 2015;75(15 Suppl):Abstract nr 4872. doi:10.1158/1538-7445.AM2015-4872

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call