Abstract

AbstractThe meaning of biological sequences is a central problem of modern biology. Although string matching is well-understood in the edit-distance model, biological strings with transpositions and inversions violate this model’s assumptions. To align biologically reasonable strings, we proposed the Walking Tree Method [4,5,6,7,8]; an approximate string alignment method that can handle insertion, deletions, substitutions, translocations, and more than one level of inversions. Our earlier versions were able to align whole bacterial genomes (1 Mbps) and discover and verify genes. As extremely long sequences can now be deciphered rapidly and accurately without amplification [2,3,15], speeding up the method becomes necessary. Via a technique that we call recurrence reduction in which some computations can be looked up rather than recomputed, we are able to significantly improve the performance, e.g. 400% for a 1-million base pair alignment. In theory, our method can align a length |P| string with a length |T| string in time |P||T|/(nlog |P|) using n processors in parallel. In practice, we can align 10 Mbps strings within a week using 30 processors.KeywordsLyme DiseaseChlamydia TrachomatisBorrelia BurgdorferiString MatchText StringThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call