Abstract

With the development of Next Generation Sequencing techniques, the analysis of megabyte-sized whole genome sequence has been common. In general genome sequence comparison is conducted by alignment algorithm model. It is accurate, but assuming that the length of the target sequence is short(less than a few kilobytes) since it requires the quadratic time and space complexity, O(n2) where n is the length of target and query sequences. To overcome these drawbacks in whole genome scale comparison, we suggest a new method for finding local similar subsequences among whole genomes based on random walk visualization. So that the sequence searching problem in DNA strings can be reduced to find some parts of geometric object within a relatively small-scale geometric space. When comparing similarity by modifying sequences of similar length, we can confirm that the comparison model is appropriate by accurately reflecting the degree of similarity. When searching the query sequence comparison model based on 200MB sized whole genome sequence, using the compressed coordinate information, it was able to search the 10MB sequences in 22s, which is a very reduced time compared to alignment.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call