Abstract

After the reference genomes of many organisms are sequenced in this post-genetic era, it has become an extremely important issue that how to do the re-sequencing and assembly for individual genomes from very large amount of reads. In this paper, we will present a re-sequencing tool designed for the Next Generation Sequencing (NGS) data. And these data are composed of a huge amount of short reads which will be aligned onto a reference genome. We modified and implemented the algorithm of Burrows-Wheeler Transform and FM-index to build the genome index of human, and proposed an idea to segment each short read into multiple non-overlapping seeds, which let us align short reads with large Hamming distance. Finally, we used the simulated datasets and real datasets from 1000 Genome Project to demonstrate the performance of our tool on a personal computer, and compared the results with widely used tools, bowtie and SOAPv2.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call