Abstract

Because a vast majority (99%) of microbes in a given community is likely to be non-cultivable, metagenomics has gradually entered the mainstream of microbial research methods. With the development of high-throughput sequencing techniques, an increasing number of sequencing read data sets of metagenomes from various microbial communities have become available. For these data sets, metagenomic analysis based on mapping reads to microbial genomes has been hampered by the limited number of microbial genomes that are available. Further, this type of analysis is computationally intensive. Thus alignment-free methods, which characterize the sequencing reads with a genomic signature instead of with genomic alignments, can be applied. However, the main requirement of these alignment-free methods is a stable genomic signature that performs reliably.Here, we propose a novel genomic signature of microbial genomes called the intrinsic correlation of oligonucleotides (ICOs). This signature represents the quantification of an intrinsic relationship between any two oligonucleotides. We analyzed microbial genomes at different taxonomic levels using ICO profiles and confirmed the wide availability of useful ICOs. We used intra-genomic and inter-genomic distances and relational grades to evaluate the performance of ICOs as a genomic signature. The results of these experiments showed that ICOs can characterize microbial genomes well, and ICOs were better at distinguishing species than tetranucleotide composition, not only in terms of whole genomes but also in terms of sequence fragments. In addition, we evaluated the performance of a hybrid feature that combined ICOs and tetranucleotide composition. The experimental results showed that the hybrid feature performed better than ICOs or tetranucleotide composition alone.ICOs can characterize microbial genomes successfully and are capable of distinguishing organisms at different taxonomic levels. ICOs perform better than tetranucleotide composition in characterizing microbial genomes. The hybrid feature that used a combination of the two kinds of sequence features had advantages over a single sequence feature.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.