Abstract

Gene sequence alignment, used to recognize the homology and variability in different species, is an important part of Bioinformatics. Creating indexes is a crucial step of gene sequence alignment algorithm. Usual algorithms of creating indexes are divided into two types. The first is algorithm based on hash table, while another is based on suffix tree or suffix array, among which BWT (Burrows-Wheeler Transform) index is a significant index structure. Currently, BWT index needs several hours’ serial computing in building a large genome sequence (such as human genome sequence). A parallel computing method based on Hadoop is presented is this paper to build suffix array and BWT index. Map Reduce is adopted as a type of data processing function, cutting suffix array into block, which will be handled separately. Ultimately, totally ordered suffix array and BWT index are output, reducing the time in building index. Meanwhile, verifying the effectiveness of the algorithm by experiments.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call