Abstract
Comparative genomic analysis at its most fundamental level involves alignment and analysis of linear strings of DNA. Many useful and powerful tools, such as BlastN and ClustalW are able to respectively, search for, and align similar strings of DNA from a variety of species. However, interesting genomic patterns cannot be immediately visualized within the information contact embedded in long genomic strings without extensive a priori knowledge. More problematic is the question of whether we will be able to crystallize long genomic sequences and analyze their true secondary and tertiary structures. It is, of course, these putative motifs that are binding to the three-dimensional structures of proteins and inducing replication and transcription events. The W-curve is a numerical mapping algorithm that allows one to geometrically visualize the information content of genomic motifs. Patterns of ALU, LINES, SINEs, and duplication sequences may be easily visualized with the W-curve. It is our hope that this pattern recognition algorithm will lead to visualization tools to track the evolutionary history of motif patterns. The combinatorics of DNA motif crossover-recombination events will be more easily followed as we continue to sequence more and more genomes. In our laboratory we are currently collaborating with mathematicians and computer scientists to develop and test tools, such as the W-curve, for analyzing patterns of long genomic sequences. In this paper, we examine the limitations of using the W-curve to infer the phylogenetic history of species.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have