Abstract

In this paper, we address the problem of searching huge biological databases on the scale of at least several gigabytes by utilizing parallel processing. Biological databases storing DNA sequences, protein sequences, or mass spectra are growing exponentially. Searches through these databases consume exponentially growing computational resources as well. We demonstrate herein a general use, MPI based, C ++ framework for generically splitting databases amongst several computational nodes. The combined RAM of the nodes working in tandem is often sufficient to keep the entire database in memory, and therefore to search it efficiently without paging to disk. The framework runs as a persistent service, processing all submitted queries. This allows for query reordering and better utilization of the memory. Thereby, we achieve superlinear speedups compared to single processor implementations. We demonstrate the utility and speedup of the framework using a real biological database and an actual searching algorithm for mass spectrometry.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.