Abstract

BackgroundUltra-deep pyrosequencing (UDPS) is used to identify rare sequence variants. The sequence depth is influenced by several factors including the error frequency of PCR and UDPS. This study investigated the characteristics and source of errors in raw and cleaned UDPS data.ResultsUDPS of a 167-nucleotide fragment of the HIV-1 SG3Δenv plasmid was performed on the Roche/454 platform. The plasmid was diluted to one copy, PCR amplified and subjected to bidirectional UDPS on three occasions. The dataset consisted of 47,693 UDPS reads. Raw UDPS data had an average error frequency of 0.30% per nucleotide site. Most errors were insertions and deletions in homopolymeric regions. We used a cleaning strategy that removed almost all indel errors, but had little effect on substitution errors, which reduced the error frequency to 0.056% per nucleotide. In cleaned data the error frequency was similar in homopolymeric and non-homopolymeric regions, but varied considerably across sites. These site-specific error frequencies were moderately, but still significantly, correlated between runs (r = 0.15–0.65) and between forward and reverse sequencing directions within runs (r = 0.33–0.65). Furthermore, transition errors were 48-times more common than transversion errors (0.052% vs. 0.001%; p<0.0001). Collectively the results indicate that a considerable proportion of the sequencing errors that remained after data cleaning were generated during the PCR that preceded UDPS.ConclusionsA majority of the sequencing errors that remained after data cleaning were introduced by PCR prior to sequencing, which means that they will be independent of platform used for next-generation sequencing. The transition vs. transversion error bias in cleaned UDPS data will influence the detection limits of rare mutations and sequence variants.

Highlights

  • Ultra-deep pyrosequencing (UDPS) is used to identify rare sequence variants

  • UDPS Data and Definitions of Sequencing Errors In this study we have investigated the types and frequencies of errors that occur during repeated UDPS of an HIV-1 clone (SG3Denv)

  • The target amplicon consisted of a 167-base pair fragment of the HIV-1 pol gene corresponding to amino acids 170–224 of the reverse transcriptase

Read more

Summary

Introduction

Ultra-deep pyrosequencing (UDPS) is used to identify rare sequence variants. The sequence depth is influenced by several factors including the error frequency of PCR and UDPS. Ultra-deep pyrosequencing (UDPS), which is one of the applications of next-generation sequencing (NGS), offers new possibilities to detect minority sequence variants [1,2,3,4]. Population Sanger sequencing can only detect minority variants that represent more than 10–20% of a heterogeneous sequence population (e.g. a HIV-1 quasispecies) [6,7]. This restricted sequencing depth sometimes limits research and clinical utility. Minority HIV resistance mutations, below the detection limit of population Sanger sequencing, have been shown to be of clinical relevance [8,9,10,11,12]. The importance of sequencing depth has been shown in studies of rare cancer cells in biopsies [13]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call