Simrank: Rapid and sensitive general-purpose k-mer search tool

Todd Z Desantis,Alexander V Alekseyenko,Ulas Karaoz,Eoin L Brodie,Zhiheng Pei,Keith Keller,Navjeet Ns Singh,Gary L Andersen,Niels Larsen

doi:10.1186/1472-6785-11-11

Todd Z Desantis, Alexander V Alekseyenko + Show 7 more

Open Access

PDF Available

https://doi.org/10.1186/1472-6785-11-11

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

BackgroundTerabyte-scale collections of string-encoded data are expected from consortia efforts such as the Human Microbiome Project http://nihroadmap.nih.gov/hmp. Intra- and inter-project data similarity searches are enabled by rapid k-mer matching strategies. Software applications for sequence database partitioning, guide tree estimation, molecular classification and alignment acceleration have benefited from embedded k-mer searches as sub-routines. However, a rapid, general-purpose, open-source, flexible, stand-alone k-mer tool has not been available.ResultsHere we present a stand-alone utility, Simrank, which allows users to rapidly identify database strings the most similar to query strings. Performance testing of Simrank and related tools against DNA, RNA, protein and human-languages found Simrank 10X to 928X faster depending on the dataset.ConclusionsSimrank provides molecular ecologists with a high-throughput, open source choice for comparing large sequence sets to find similarity.

Highlights

Terabyte-scale collections of string-encoded data are expected from consortia efforts such as the Human Microbiome Project http://nihroadmap.nih.gov/hmp
Indexing the list of institute names directly was impossible for Sequence Search and Alignment by Hashing Algorithm (SSAHA2), BLAST and megaBLAST, so an artificial conversion from language to DNA [25] was performed
Since BLAST constrains its results to only subregions of high similarity, it was run with parameter ‘-q -1’ to allow longer match regions and equitable comparison to Simrank

Summary

Introduction

Terabyte-scale collections of string-encoded data are expected from consortia efforts such as the Human Microbiome Project http://nihroadmap.nih.gov/hmp. Intra- and inter-project data similarity searches are enabled by rapid k-mer matching strategies. Software applications for sequence database partitioning, guide tree estimation, molecular classification and alignment acceleration have benefited from embedded k-mer searches as sub-routines. A rapid, general-purpose, open-source, flexible, stand-alone k-mer tool has not been available. Molecular ecology methods often require the collection of thousands of polymer sequences (DNA, RNA or proteins) extracted from biological specimens (isolates or communities) followed by a similarity search of those sequences against one or more reference databases. A general-purpose open-source software tool to aid biologists in performing all the aforementioned tasks is not readily available. Cd-hit does not allow the decoupling of k-mer searches from the clustering, it is not used as a general-purpose similarity reporting tool

Methods

Results

Discussion

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Ecology	Publication Date: Jan 1, 2011
Citations: 30	License type: cc-by

R Discovery Prime

Simrank: Rapid and sensitive general-purpose k-mer search tool

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: BMC Ecology

Lead the way for us

Similar Papers

A data analysis and coordination center for the human microbiome project
...
Genome Biology | VOL. 11
, et. al. ...
01 Jan 2009
Genome Biology | VOL. 11

FIGO staging of endometrial cancer: 2023.
Jonathan S Berek ... David Gaffney
International journal of gynaecology and obstetrics: the official organ of the International Federation of Gynaecology and Obstetrics | VOL. 162
Jonathan S Berek, et. al.Jonathan S Berek ... David Gaffney
20 Jun 2023
FIGO staging of endometrial cancer: 2023.
Jonathan S Berek ... David Gaffney

FIGO staging of endometrial cancer: 2023.
Jonathan S Berek ... Kristina Lindemann
Journal of Gynecologic Oncology | VOL. 34
Jonathan S Berek, et. al.Jonathan S Berek ... Kristina Lindemann
01 Jan 2023
FIGO staging of endometrial cancer: 2023.
Jonathan S Berek ... Kristina Lindemann

The Role of the Gut Microbiota in Ulcerative Colitis
Vincent Young ... Folker Meyer
Nature Precedings | VOL. -
Vincent Young, et. al.Vincent Young ... Folker Meyer
15 Nov 2010
Nature Precedings | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Simrank: Rapid and sensitive general-purpose k-mer search tool

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: BMC Ecology