Abstract

MotivationAlthough there are many different algorithms and software tools for aligning sequencing reads, fast gapped sequence search is far from solved. Strong interest in fast alignment is best reflected in the $106 prize for the Innocentive competition on aligning a collection of reads to a given database of reference genomes. In addition, de novo assembly of next-generation sequencing long reads requires fast overlap-layout-concensus algorithms which depend on fast and accurate alignment.ContributionWe introduce ARYANA, a fast gapped read aligner, developed on the base of BWA indexing infrastructure with a completely new alignment engine that makes it significantly faster than three other aligners: Bowtie2, BWA and SeqAlto, with comparable generality and accuracy. Instead of the time-consuming backtracking procedures for handling mismatches, ARYANA comes with the seed-and-extend algorithmic framework and a significantly improved efficiency by integrating novel algorithmic techniques including dynamic seed selection, bidirectional seed extension, reset-free hash tables, and gap-filling dynamic programming. As the read length increases ARYANA's superiority in terms of speed and alignment rate becomes more evident. This is in perfect harmony with the read length trend as the sequencing technologies evolve. The algorithmic platform of ARYANA makes it easy to develop mission-specific aligners for other applications using ARYANA engine.AvailabilityARYANA with complete source code can be obtained from http://github.com/aryana-aligner

Highlights

  • Every living cell carries a book of life consisting of several thousand to billions of characters with answers to many vital questions

  • Maxarn read the first 24-character word of the book [1]. when F. Sanger and his colleagues were developing another sequencing method based on the application of labeled dideoxynucleotide triphosphates that act as chain-terminators in a PCR reaction [2,3]

  • Every read is individually aligned by ARYANA, which enables using it in distributed computing frameworks by partitioning the input read data set, in addition to the multithreaded parallel infrastructure embedded in ARYANA that permits complete CPU usage when running on a multi-core machine

Read more

Summary

Introduction

Every living cell carries a book of life consisting of several thousand to billions of characters with answers to many vital questions. Sanger and his colleagues were developing another sequencing method based on the application of labeled dideoxynucleotide triphosphates that act as chain-terminators in a PCR reaction [2,3]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call