Abstract
BackgroundHigh throughput pyrosequencing (454 sequencing) is the major sequencing platform for producing long read high throughput data. While most other sequencing techniques produce reading errors mainly comparable with substitutions, pyrosequencing produce errors mainly comparable with gaps. These errors are less efficiently detected by most conventional alignment programs and may produce inaccurate alignments.ResultsWe suggest a novel algorithm for calculating the optimal local alignment which utilises flowpeak information in order to improve alignment accuracy. Flowpeak information can be retained from a 454 sequencing run through interpretation of the binary SFF-file format. This novel algorithm has been implemented in a program named FAAST (Flow-space Assisted Alignment Search Tool).ConclusionsWe present and discuss the results of simulations that show that FAAST, through the use of the novel algorithm, can gain several percentage points of accuracy compared to Smith-Waterman-Gotoh alignments, depending on the 454 data quality. Furthermore, through an efficient multi-thread aware implementation, FAAST is able to perform these high quality alignments at high speed.The tool is available at http://www.ifm.liu.se/bioinfo/
Highlights
High throughput pyrosequencing (454 sequencing) is the major sequencing platform for producing long read high throughput data
We suggest the use of flowspace assisted Smith-Waterman-Gotoh alignments, i.e. giving the local alignment algorithm the ability to correct for likely sequencing errors while computing the alignment
The model used for the Smith-Waterman-Gotoh alignment is match/mismatch score = 2/-3 and gap open/extended penalty 5/2, and the additional parameter for the flow-space assisted local alignment is program default (k = 0.25)
Summary
High throughput pyrosequencing (454 sequencing) is the major sequencing platform for producing long read high throughput data. While most other sequencing techniques produce reading errors mainly comparable with substitutions, pyrosequencing produce errors mainly comparable with gaps. These errors are less efficiently detected by most conventional alignment programs and may produce inaccurate alignments. The first 2nd generation sequencing method was 454 sequencing, introduced in 2005 with the GS20 sequencing machine which produced 20 million base-pairs (Mbp) per run [1]. 454 sequencing has since been improved steadily both regarding quality and throughput, and the GS FLX Titanium, introduced in 2008, produces 500 Mbp per run, as reads of approximately 350 bp [2].
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.