MICA: desktop software for comprehensive searching of DNA databases.

William A Stokes,Benjamin S Glick

doi:10.1186/1471-2105-7-427

Abstract

BackgroundMolecular biologists work with DNA databases that often include entire genomes. A common requirement is to search a DNA database to find exact matches for a nondegenerate or partially degenerate query. The software programs available for such purposes are normally designed to run on remote servers, but an appealing alternative is to work with DNA databases stored on local computers. We describe a desktop software program termed MICA (K-Mer Indexing with Compact Arrays) that allows large DNA databases to be searched efficiently using very little memory.ResultsMICA rapidly indexes a DNA database. On a Macintosh G5 computer, the complete human genome could be indexed in about 5 minutes. The indexing algorithm recognizes all 15 characters of the DNA alphabet and fully captures the information in any DNA sequence, yet for a typical sequence of length L, the index occupies only about 2L bytes. The index can be searched to return a complete list of exact matches for a nondegenerate or partially degenerate query of any length. A typical search of a long DNA sequence involves reading only a small fraction of the index into memory. As a result, searches are fast even when the available RAM is limited.ConclusionMICA is suitable as a search engine for desktop DNA analysis software.

Highlights

Molecular biologists work with DNA databases that often include entire genomes
MICA is suitable as a search engine for desktop DNA analysis software
MICA was coded in C++ and tested on a 2.5-GHz G5 Macintosh running OS X (10.4, Tiger) with 2.5GB of RAM

Summary

Results

On a Macintosh G5 computer, the complete human genome could be indexed in about 5 minutes. The indexing algorithm recognizes all 15 characters of the DNA alphabet and fully captures the information in any DNA sequence, yet for a typical sequence of length L, the index occupies only about 2L bytes. The index can be searched to return a complete list of exact matches for a nondegenerate or partially degenerate query of any length. A typical search of a long DNA sequence involves reading only a small fraction of the index into memory. Searches are fast even when the available RAM is limited

Background

Results and discussion

Conclusion

Gusfield D

Kent WJ

11. Knuth DE

15. Hunt E

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Oct 3, 2006
Citations: 25	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

MICA: desktop software for comprehensive searching of DNA databases.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

PDA: a pipeline to explore and estimate polymorphism in large DNA databases.
Sònia Casillas ... Antonio Barbadilla
Nucleic acids research | VOL. 32
Sònia Casillas, et. al.Sònia Casillas ... Antonio Barbadilla
01 Jul 2004
Nucleic acids research | VOL. 32

Forensic genetics
Niels Morling
The Lancet | VOL. 364
Niels MorlingNiels Morling
01 Dec 2004
The Lancet | VOL. 364

Biobanks, Association Studies and Validity: Ethical, Legal and Social Challenges in Asia
Minakshi Bhardwaj
Journal of International Biotechnology Law | VOL. 4
Minakshi BhardwajMinakshi Bhardwaj
27 Jan 2007
Journal of International Biotechnology Law | VOL. 4

Autosomal STR Profiling and Databanking in Malaysia: Current Status and Future Prospects.
Hashom Mohd Hakim ... Geoffrey Keith Chambers
Genes | VOL. 11
Hashom Mohd Hakim, et. al.Hashom Mohd Hakim ... Geoffrey Keith Chambers
23 Sep 2020
Genes | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MICA: desktop software for comprehensive searching of DNA databases.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics