Abstract
In genome sequence alignment problem, a reference string and number of query strings referred as short reads, are given, goal is to seek out occurrences of these query strings in the reference string. Huge amount of reads generated by new sequencing technologies (Illumina/Solexa) need the development of an efficient algorithm requiring both less memory and computational time. There are number of indexing and string matching techniques to align short reads on reference string (genome). Size of index of the reference string in each of existing techniques is large. In this paper, a new self compressed index technique (BWT-WT) is proposed. BWT-WT scheme is based on Burrow Wheeler Transform (BWT) and Wavelet tree (WT). BWT-WT also supports exact alignment of DNA sequence reads. Performances of BWT-WT with other BWT based tools of short read alignments are compared. Experiments show that BWT-WT based program achieves more compression and also faster searching in comparison to other existing tools.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have