Abstract

Next-generation sequencing (NGS) has revolutionized genetics and enabled the accurate identification of many genetic variants across many genomes. However, detection of biologically important low-frequency variants within genetically heterogeneous populations remains challenging, because they are difficult to distinguish from intrinsic NGS sequencing error rates. Approaches to overcome these limitations are essential to detect rare mutations in large cohorts, virus or microbial populations, mitochondria heteroplasmy, and other heterogeneous mixtures such as tumors. Modifications in library preparation can overcome some of these limitations, but are experimentally challenging and restricted to skilled biologists. This paper describes a novel quality filtering and base pruning pipeline, called Complex Heterogeneous Overlapped Paired-End Reads (CHOPER), designed to detect sequence variants in a complex population with high sequence similarity derived from All-Codon-Scanning (ACS) mutagenesis. A novel fast alignment algorithm, designed for the specified application, has O(n) time complexity. CHOPER was applied to a p53 cancer mutant reactivation study derived from ACS mutagenesis. Relative to error filtering based on Phred quality scores, CHOPER improved accuracy by about 13% while discarding only half as many bases. These results are a step toward extending the power of NGS to the analysis of genetically heterogeneous populations.

Highlights

  • Next-generation sequencing (NGS) is a developing research area with an extensive growth of applications [1,2,3]

  • Occurrence frequency of individual mutations in heterogeneous ACS libraries is lower than the sequencing error rate associated with NGS, and previously this problem has precluded identification of these biologically meaningful variants. To overcome this limitation we developed a series of quality filtering and base pruning operations, called Complex Heterogeneous Overlapped Paired-End Reads (CHOPER) filtering, that together provide novel error filtering and mutation detection in the complex heterogeneous population derived from ACS mutagenesis [15]

  • It is notable that p53-R175H is the single most common individual p53 cancer mutant found in human tumors; it has not been reactivated by any single amino acid change in previous studies [15]

Read more

Summary

Introduction

Next-generation sequencing (NGS) is a developing research area with an extensive growth of applications [1,2,3]. The high coverage achievable with NGS methods has enabled the detection of many low-frequency variants, including somatic mutations across the genome [1,4,5]. In these traditional applications of NGS, the cell population has a homogeneous genome, and so PLOS ONE | DOI:10.1371/journal.pone.0116877. Principal Investigators R.H.L. and P.K. have an equity interest in Actavalon, Inc., and serve on the Scientific Advisory Board The terms of this arrangement have been reviewed and approved by the University of California, Irvine, in accordance with its conflict of interest policies. This does not alter the authors' adherence to PLOS ONE policies on sharing data and materials

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call