Abstract

For the taxonomic classification of microbes, 16S ribosomal RNA (rRNA) gene sequences are widely used in environmental microbiology as reliable markers. Although the massive sequencing of 16S rRNA gene amplicons encompassing the full length of genes is not easy, because of the limitations of the current sequencing techniques, in databases Greengenes, RDP, and SILVA millions of rRNA gene sequences are uploaded. In this research, first a new similarity measure LCSS, for full length genes is defined. Then it is found that sequences reported for the same bacteria species demonstrate around 53% average sequence similarity in Greengenes and SILVA databases, while average similarity among genes reported for different bacteria species is around 15% only. This is 63%, and 20% respectively at genus level for the three data bases Greengenes, RDP, and SILVA. Hence, species, and genus-specific sequences constitute useful targets for diagnostic assays and other scientific investigations. In the present research, the built in function LongestCommonSubsequence is used repeatedly in computer algebra package MATHEMATICA to create an in silico pipeline for taxonomic classification uploaded new full-length sequences. Conclusions: Our results suggest that LongestCommonSubsequence similarity can be used for taxonomic classification of unknown bacteria through their full 16S ribosomal RNA (rRNA) gene sequences.

Highlights

  • Bacteria contribute immensely to global energy conversion and the recycling of matter in almost all environments explored

  • The flora of the human gut has been extensively explored for potential associations with the appearance of many human diseases (Duvallet et al 2017; Forbes, et al 2016; Turnbaugh et al 2006) [1-3] The collection of microbes and their genes that exist within and on the skin of the human body, are known as the microbiome

  • As a result of these research activities, there are a substantial number of microbial community datasets deposited in sequence archives, as an example, the European Nucleotide Archive currently holds over 600,000 environmental samples (Mitchell et al, 2017)[7], and the rate of deposition is climbing

Read more

Summary

INTRODUCTION

Bacteria contribute immensely to global energy conversion and the recycling of matter in almost all environments explored. The most widely used tools for this purpose are the mothur (Schloss, JG, et al, 2009) [9] and Quantitative Insights Into Microbial Ecology (QIIME) software packages (Caporaso,D., et., al., 2010) [10] These correspond to large toolsets that are able to process, classify, and perform downstream analyses on individual genetic markers like the 16S rRNA gene, conserved across the prokaryotic domains. There is significant inconsistency in species nomenclature across all reference databases, for example, RDP does not report taxon names below genus In their work, they calculated the degree of recall and precision at the genus and family ranks, as in our opinion they provide the best compromise between classification accuracy and resolution

MATERIALS AND METHODS
Longest Common Subsequence Search
In class and Interclass Similarities
RESULTS AND DISCUSSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.