Abstract

Due to advancements in sequencing technology, sequence data production is no longer a constraint in the field of microbiology and has made it possible to study uncultured microbes or whole environments using metagenomics. However, these new technologies introduce different biases in metagenomic sequencing, affecting the nucleotide distribution of resulting sequence reads. Here, we illustrate such biases using two methods. One is based on phylogenetic heatmaps (PGHMs), a novel approach for compact visualization of sequence composition differences between two groups of sequences containing the same phylogenetic groups. This method is well suited for finding noise and biases when comparing metagenomics samples. We apply PGHMs to detect noise and bias in the data produced with different DNA extraction protocols, different sequencing platforms and different experimental frameworks. In parallel, we use principal component analysis displaying different clustering of sequences from each sample to support our findings and illustrate the utility of PGHMs. We considered contributions of the read length and GC-content variation and observed that in most cases biases were generally due to the GC-content of the reads.

Highlights

  • In recent years, metagenomics has emerged as a powerful tool involving the study of the genome of microbial communities by sequencing microbial DNA extracted directly from environmental samples

  • We investigated the effectiveness of clustering in principal component analysis (PCA) analysis using our phylogenetic heatmaps (PGHMs)

  • We conclude that the sequence composition of samples that were supposed to represent similar/comparable metagenomes varied depending on the sequencing technologies used represent similar/comparable variedsuch depending onThe the di, sequencing technologies and our method is efficientmetagenomes in detecting/finding variation

Read more

Summary

Introduction

Metagenomics has emerged as a powerful tool involving the study of the genome of microbial communities by sequencing microbial DNA extracted directly from environmental samples. Metagenomics and high-throughput sequencing have expanded our knowledge in the field of microbiology by directly accessing the microbial community genomes, there are still uncertainties in the data. There are many challenges in analyzing metagenomic data, including the assessment of microbial abundance in environmental samples using frequency of occurrence of an organism’s DNA observed in sequencing reads [2]. Such frequencies and corresponding abundance estimates depend significantly on the DNA extraction and sequencing protocols used. Several groups have investigated the bias imposed by (i) various

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.