The Method of a Gene Sequence Alignment BWT Index Based on Hadoop

Nan Li

doi:10.11648/j.ijgg.20160403.13

Abstract

Gene sequence alignment, used to recognize the homology and variability in different species, is an important part of Bioinformatics. Creating indexes is a crucial step of gene sequence alignment algorithm. Usual algorithms of creating indexes are divided into two types. The first is algorithm based on hash table, while another is based on suffix tree or suffix array, among which BWT (Burrows-Wheeler Transform) index is a significant index structure. Currently, BWT index needs several hours’ serial computing in building a large genome sequence (such as human genome sequence). A parallel computing method based on Hadoop is presented is this paper to build suffix array and BWT index. Map Reduce is adopted as a type of data processing function, cutting suffix array into block, which will be handled separately. Ultimately, totally ordered suffix array and BWT index are output, reducing the time in building index. Meanwhile, verifying the effectiveness of the algorithm by experiments.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The Method of a Gene Sequence Alignment BWT Index Based on Hadoop

Abstract

Talk to us

Similar Papers

More From: International Journal of Genetics and Genomics

Lead the way for us

Journal: International Journal of Genetics and Genomics	Publication Date: Jan 1, 2016
License type: cc-by

Similar Papers

Solving All-Pairs Suffix Prefix – Theory and Practice
Maan Haj Rachid ... Qutaibah Malluhi
-
Maan Haj Rachid, et. al.Maan Haj Rachid ... Qutaibah Malluhi
01 Jan 2015
01 Jan 2015

Linear-time String Indexing and Analysis in Small Space
Djamal Belazzougui ... Veli Mäkinen
ACM Transactions on Algorithms | VOL. 16
Djamal Belazzougui, et. al.Djamal Belazzougui ... Veli Mäkinen
09 Mar 2020
ACM Transactions on Algorithms | VOL. 16

HIA: a genome mapper using hybrid index-based sequence alignment.
Jongpill Choi ... Seong Beom Cho
Algorithms for Molecular Biology | VOL. 10
Jongpill Choi, et. al.Jongpill Choi ... Seong Beom Cho
01 Dec 2015
Algorithms for Molecular Biology | VOL. 10

An Efficient Index Data Structure with the Capabilities of Suffix Trees and Suffix Arrays for Alphabets of Non-negligible Size
Dong Kyue Kim ... Jeong Eun Jeon
-
Dong Kyue Kim, et. al.Dong Kyue Kim ... Jeong Eun Jeon
01 Jan 2004
01 Jan 2004

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The Method of a Gene Sequence Alignment BWT Index Based on Hadoop

Abstract

Talk to us

Similar Papers

More From: International Journal of Genetics and Genomics