Abstract

BackgroundAdvances in biotechnology have changed the manner of characterizing large populations of microbial communities that are ubiquitous across several environments."Metagenome" sequencing involves decoding the DNA of organisms co-existing within ecosystems ranging from ocean, soil and human body. Several researchers are interested in metagenomics because it provides an insight into the complex biodiversity across several environments. Clinicians are using metagenomics to determine the role played by collection of microbial organisms within human body with respect to human health wellness and disease.ResultsWe have developed an efficient and scalable, species richness estimation algorithm that uses locality sensitive hashing (LSH). Our algorithm achieves efficiency by approximating the pairwise sequence comparison operations using hashing and also incorporates matching of fixed-length, gapless subsequences criterion to improve the quality of sequence comparisons. We use LSH-based similarity function to cluster similar sequences and make individual groups, called operational taxonomic units (OTUs). We also compute different species diversity/richness metrics by utilizing OTU assignment results to further extend our analysis.ConclusionThe algorithm is evaluated on synthetic samples and eight targeted 16S rRNA metagenome samples taken from seawater. We compare the performance of our algorithm with several competing diversity estimation algorithms. We show the benefits of our approach with respect to computational runtime and meaningful OTU assignments. We also demonstrate practical significance of the developed algorithm by comparing bacterial diversity and structure across different skin locations.Websitehttp://www.cs.gmu.edu/~mlbio/LSH-DIV

Highlights

  • Advances in biotechnology have changed the manner of characterizing large populations of microbial communities that are ubiquitous across several environments."Metagenome” sequencing involves decoding the DNA of organisms co-existing within ecosystems ranging from ocean, soil and human body

  • We assess the performance of locality sensitive hashing (LSH)-Div algorithm with respect to the number of operational taxonomic units (OTUs), different species diversity metrics and run time

  • Synthetic dataset To formalize the accuracy and completeness of the LSH-Div algorithm, we evaluate the performance of our method on a synthetic dataset

Read more

Summary

Introduction

Advances in biotechnology have changed the manner of characterizing large populations of microbial communities that are ubiquitous across several environments."Metagenome” sequencing involves decoding the DNA of organisms co-existing within ecosystems ranging from ocean, soil and human body. Clinicians are using metagenomics to determine the role played by collection of microbial organisms within human body with respect to human health wellness and disease. New genomic technologies allow researchers to determine DNA sequences of organisms existing as communities across different environments [1], [2]. Metagenome samples consist of several DNA sequences originating from all organisms in the examined environment. By comprehensive study of nucleotide sequence, structure, regulation, and biological functions within the community, the roles played by microbial communities can potentially be examined

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.