Abstract

Genome sequencing technology is generating large databases of sequence at such a rate that advances in computer hardware alone are not adequate to handle them: more efficient algorithms are needed. Here an alignment-free method of sequence comparison and visualisation based on the Chaos Games Representation (CGR) and multifractal analysis is explored as an approach to search and filter through a data set of over 1500 microbial genomes. Whereas BLAST takes 25hours to search this data set with large sequence fragments (e.g. 100 Kb), the method introduced here can reduce this data set by 95% (from 1550 target species to just 50) in about 15minutes, and it is able to predict the exact species correctly in 67% of cases. The results presented here demonstrate that CGR is worth further investigation as a fast method to perform genome sequence comparison on large data sets, and various ways to further develop the method are discussed.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.