Abstract
To analyze complex biodiversity in microbial communities, 16S rRNA marker gene sequences are often assigned to operational taxonomic units (OTUs). The abundance of methods that have been used to assign 16S rRNA marker gene sequences into OTUs brings discussions in which one is better. Suggestions on having clustering methods should be stable in which generated OTU assignments do not change as additional sequences are added to the dataset is contradicting some other researches contend that the methods should properly present the distances of sequences is more important. We add one more de novo clustering algorithm, Rolling Snowball to existing ones including the single linkage, complete linkage, average linkage, abundance-based greedy clustering, distance-based greedy clustering, and Swarm and the open and closed-reference methods. We use GreenGenes, RDP, and SILVA 16S rRNA gene databases to show the success of the method. The highest accuracy is obtained with SILVA library.
Highlights
Metagenomics is a recently-born and highly popular field that studies the genomic contents of microbial communities living in certain environments and tries to understand the structure and function of these microbial communities by sequencing genomic fragments from environmental samples without the need of cultivating them in a laboratory (Huttenhower et al, 2012; Qin et al, 2010)
This type of clustering is referred to as phylotyping (Schloss & Westcott, 2011) or closed-reference clustering (Navas-Molina et al, 2013). This approach compares sequence reads to a reference database and cluster them into the same operational taxonomic units (OTUs) that is similar to the same reference read
De novo clustering (Navas-Molina et al, 2013) which is referred to as distance-based (Schloss & Westcott, 2011) clustering, the distance between sequences is used to bin sequences into OTUs rather than using a reference database to calculate distances
Summary
Metagenomics is a recently-born and highly popular field that studies the genomic contents of microbial communities living in certain environments and tries to understand the structure and function of these microbial communities by sequencing genomic fragments from environmental samples without the need of cultivating them in a laboratory (Huttenhower et al, 2012; Qin et al, 2010). Rapid development in NGS has made it possible to directly sequence a huge amount of DNA/RNA fragments extracted from environmental samples such as human gut, marine or soil in a reasonable time (Eisen, 2011) It has made sequencing faster and highly economical providing a unique opportunity to study the microbial diversity of many complex environments at a much lower cost (Desai et al, 2013). To simplify the complexity of large datasets generated by NGS technologies, sequences are clustered into meaningful bins These bins are called operational taxonomic units (OTUs) which are used to study the biodiversity within and between different samples (Schloss & Westcott, 2011). There are popular reference databases: Ribosomal Database Project (RDP) (Cole et al, 2009), Greengenes (DeSantis et al, 2006), SILVA (Pruesse et al, 2007), NCBI (Federhen, 2012), Open Tree of Life Taxonomy (OTT) (Hinchliff et al, 2014), and UNITE (Kõljalg et al, 2013)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.