Abstract

BackgroundThe rapid evolution of 454 GS-FLX sequencing technology has not been accompanied by a reassessment of the quality and accuracy of the sequences obtained. Current strategies for decision-making and error-correction are based on an initial analysis by Huse et al. in 2007, for the older GS20 system based on experimental sequences. We analyze here the quality of 454 sequencing data and identify factors playing a role in sequencing error, through the use of an extensive dataset for Roche control DNA fragments.ResultsWe obtained a mean error rate for 454 sequences of 1.07%. More importantly, the error rate is not randomly distributed; it occasionally rose to more than 50% in certain positions, and its distribution was linked to several experimental variables. The main factors related to error are the presence of homopolymers, position in the sequence, size of the sequence and spatial localization in PT plates for insertion and deletion errors. These factors can be described by considering seven variables. No single variable can account for the error rate distribution, but most of the variation is explained by the combination of all seven variables.ConclusionsThe pattern identified here calls for the use of internal controls and error-correcting base callers, to correct for errors, when available (e.g. when sequencing amplicons). For shotgun libraries, the use of both sequencing primers and deep coverage, combined with the use of random sequencing primer sites should partly compensate for even high error rates, although it may prove more difficult than previous thought to distinguish between low-frequency alleles and errors.

Highlights

  • The rapid evolution of 454 GS-FLX sequencing technology has not been accompanied by a reassessment of the quality and accuracy of the sequences obtained

  • Accuracy and quality of sequences We assessed the quality of the sequences obtained by 454 GS-FLX Titanium sequencing, using the control DNA fragment Type I sequences as reference templates

  • The quality of these control sequences is not influenced by the samples themselves, with Titanium technology, in which loading beads are isolated from each other and there should be no interference from adjacent beads

Read more

Summary

Introduction

The rapid evolution of 454 GS-FLX sequencing technology has not been accompanied by a reassessment of the quality and accuracy of the sequences obtained. Many aspects of basic, applied and clinical research rely on the generation of enormous amounts of sequence data from various sample sources, to assess polymorphism (mostly SNPs), or expression data (RNA-Seq) at the genome level [1,2] This shift in the scale of sequence acquisition has been achieved by simultaneous progress in bioinformatics, the availability of genome assemblies and key technical findings in the domains of biochemistry and sequencing device physics [3]. 454 GS-FLX Titanium technology provides around 1,000,000 sequences in a single 10-hour run These sequences, with an average read length equal to 330 bp, may be up to 500 bp in shotgun libraries conditions, much longer than can be obtained with the other available approaches. This makes mapping easier, for repetitive regions, and facilitates de novo genome sequencing, exome capture, metagenomics and amplicon sequencing [4]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call