Pre- and post-sequencing recommendations for functional annotation of human fecal metagenomes

Michelle L Treiber,David A Mills,Danielle G Lemay,Ian Korf,Diana H Taft

doi:10.1186/s12859-020-3416-y

Michelle L Treiber, David A Mills + Show 3 more

Open Access

https://doi.org/10.1186/s12859-020-3416-y

Copy DOI

Abstract

BackgroundShotgun metagenomes are often assembled prior to annotation of genes which biases the functional capacity of a community towards its most abundant members. For an unbiased assessment of community function, short reads need to be mapped directly to a gene or protein database. The ability to detect genes in short read sequences is dependent on pre- and post-sequencing decisions. The objective of the current study was to determine how library size selection, read length and format, protein database, e-value threshold, and sequencing depth impact gene-centric analysis of human fecal microbiomes when using DIAMOND, an alignment tool that is up to 20,000 times faster than BLASTX.ResultsUsing metagenomes simulated from a database of experimentally verified protein sequences, we find that read length, e-value threshold, and the choice of protein database dramatically impact detection of a known target, with best performance achieved with longer reads, stricter e-value thresholds, and a custom database. Using publicly available metagenomes, we evaluated library size selection, paired end read strategy, and sequencing depth. Longer read lengths were acheivable by merging paired ends when the sequencing library was size-selected to enable overlaps. When paired ends could not be merged, a congruent strategy in which both ends are independently mapped was acceptable. Sequencing depths of 5 million merged reads minimized the error of abundance estimates of specific target genes, including an antimicrobial resistance gene.ConclusionsShotgun metagenomes of DNA extracted from human fecal samples sequenced using the Illumina platform should be size-selected to enable merging of paired end reads and should be sequenced in the PE150 format with a minimum sequencing depth of 5 million merge-able reads to enable detection of specific target genes. Expecting the merged reads to be 180-250 bp in length, the appropriate e-value threshold for DIAMOND would then need to be more strict than the default. Accurate and interpretable results for specific hypotheses will be best obtained using small databases customized for the research question.

Highlights

Shotgun metagenomes are often assembled prior to annotation of genes which biases the functional capacity of a community towards its most abundant members
The 100 simulated metagenomes were mapped to a custom betagalactosidase database, the NCBI RefSeq database [13] or the SEED database [14] using DIAMOND with the “sensitive” flag and default e-value threshold
True positive rates were expressed as the proportion of reads known to originate from the target that were correctly identified as the target

Summary

Introduction

Shotgun metagenomes are often assembled prior to annotation of genes which biases the functional capacity of a community towards its most abundant members. For an unbiased assessment of community function, short reads need to be mapped directly to a gene or protein database. One common way of functionally annotating a shotgun metagenome is to first assemble the reads into longer fragments of DNA called contigs, predict open reading frames in the contigs and map these ORFs to gene family databases. This is valuable because it can provide genomic context for individual genes. If the objective of the study is to assess the overall functional capacity of a community, it may not be necessary to put sequences in the context of their individual genomes

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Feb 24, 2020
Citations: 15	License type: open-access

R Discovery Prime

R Discovery Prime

Pre- and post-sequencing recommendations for functional annotation of human fecal metagenomes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Next-Generation Sequencing Strategies Enable Routine Detection of Balanced Chromosome Rearrangements for Clinical Diagnostics and Genetic Research
Michael E Talkowski ... James F Gusella
The American Journal of Human Genetics | VOL. 88
Michael E Talkowski, et. al.Michael E Talkowski ... James F Gusella
01 Apr 2011
The American Journal of Human Genetics | VOL. 88

Evaluation of methods to detect circular RNAs from single-end RNA-sequencing data
Manh Hung Nguyen ... Ha-Nam Nguyen
BMC Genomics | VOL. 23
Manh Hung Nguyen, et. al.Manh Hung Nguyen ... Ha-Nam Nguyen
08 Feb 2022
BMC Genomics | VOL. 23

Hybrid-denovo: a de novo OTU-picking pipeline integrating single-end and paired-end 16S sequence tags.
Xianfeng Chen ... Jun Chen
GigaScience | VOL. 7
Xianfeng Chen, et. al.Xianfeng Chen ... Jun Chen
15 Dec 2017
GigaScience | VOL. 7

GETTING GOOD-QUALITY NUCLEIC ACID FROM WASTEWATER SAMPLES AND ITS SIGNIFICANCE
Aditi Nag ... Sudipti Arora
-
Aditi Nag, et. al.Aditi Nag ... Sudipti Arora
06 Mar 2024
06 Mar 2024

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Pre- and post-sequencing recommendations for functional annotation of human fecal metagenomes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics