Abstract

IntroductionWhole-metagenome sequencing can be a rich source of information about the structure and function of entire metagenomic communities, but getting accurate and reliable results from these datasets can be challenging. Analysis of these datasets is founded on the mapping of sequencing reads onto known genomic regions from known organisms, but short reads will often map equally well to multiple regions, and to multiple reference organisms. Assembling metagenomic datasets prior to mapping can generate much longer and more precisely mappable sequences but the presence of closely related organisms and highly conserved regions makes metagenomic assembly challenging, and some regions of particular interest can assemble poorly. One solution to these problems is to use specialised tools, such as Kelpie, that can accurately extract and assemble full-length sequences for defined genomic regions from whole-metagenome datasets.MethodsKelpie is a kMer-based tool that generates full-length amplicon-like sequences from whole-metagenome datasets. It takes a pair of primer sequences and a set of metagenomic reads, and uses a combination of kMer filtering, error correction and assembly techniques to construct sets of full-length inter-primer sequences.ResultsThe effectiveness of Kelpie is demonstrated here through the extraction and assembly of full-length ribosomal marker gene regions, as this allows comparisons with conventional amplicon sequencing and published metagenomic benchmarks. The results show that the Kelpie-generated sequences and community profiles closely match those produced by amplicon sequencing, down to low abundance levels, and running Kelpie on the synthetic CAMI metagenomic benchmarking datasets shows similar high levels of both precision and recall.ConclusionsKelpie can be thought of as being somewhat like an in-silico PCR tool, taking a primer pair and producing the resulting ‘amplicons’ from a whole-metagenome dataset. Marker regions from the 16S rRNA gene were used here as an example because this allowed the overall accuracy of Kelpie to be evaluated through comparisons with other datasets, approaches and benchmarks. Kelpie is not limited to this application though, and can be used to extract and assemble any genomic region present in a whole metagenome dataset, as long as it is bound by a pairs of highly conserved primer sequences.

Highlights

  • Whole-metagenome sequencing can be a rich source of information about the structure and function of entire metagenomic communities, but getting accurate and reliable results from these datasets can be challenging

  • The results from the coal seam metagenome study showed that Kelpie-generated sequences could be used to generate microbial community profiles with an accuracy and depth comparable to conventional PCR, but that the centroid sequences for the resulting Kelpie and PCR OTU clusters were either identical or found within the small ‘cloud’ of sequences subsumed within each cluster

  • The results discussed above show that Kelpie can successfully extract and assemble full length inter-primer genomic regions from whole metagenome sequencing datasets with high precision and recall, even for challenging regions such as the ubiquitous and repeated 16S rRNA gene

Read more

Summary

Introduction

Whole-metagenome sequencing can be a rich source of information about the structure and function of entire metagenomic communities, but getting accurate and reliable results from these datasets can be challenging. Kelpie can be thought of as an in silico PCR program It takes a pair of primer sequences and a whole metagenome sequencing (WGS) dataset, and generates a corresponding set of inter-primer amplicon-like sequences. Whole-metagenome sequencing datasets can be a rich resource for investigating both the structure of a metagenomic community and the functional capabilities of its members, but reliably and accurately extracting such information from large volumes of sequencing data can be challenging These challenges arise from the nature of the sequencing data itself, the presence of ubiquitous and highly conserved genomic regions and the possible presence of related organisms within the community. Assembling the metagenomes can generate much longer and more distinctive sequences, and these can be used to more reliably determine the presence of particular organisms or genes, but metagenomic assembly is itself challenging in the presence of conserved regions and related organisms (Treangen & Salzberg, 2012; Wang et al, 2015)

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.