Abstract

BackgroundElucidating the role of gut microbiota in physiological and pathological processes has recently emerged as a key research aim in life sciences. In this respect, metaproteomics, the study of the whole protein complement of a microbial community, can provide a unique contribution by revealing which functions are actually being expressed by specific microbial taxa. However, its wide application to gut microbiota research has been hindered by challenges in data analysis, especially related to the choice of the proper sequence databases for protein identification.ResultsHere, we present a systematic investigation of variables concerning database construction and annotation and evaluate their impact on human and mouse gut metaproteomic results. We found that both publicly available and experimental metagenomic databases lead to the identification of unique peptide assortments, suggesting parallel database searches as a mean to gain more complete information. In particular, the contribution of experimental metagenomic databases was revealed to be mandatory when dealing with mouse samples. Moreover, the use of a “merged” database, containing all metagenomic sequences from the population under study, was found to be generally preferable over the use of sample-matched databases. We also observed that taxonomic and functional results are strongly database-dependent, in particular when analyzing the mouse gut microbiota. As a striking example, the Firmicutes/Bacteroidetes ratio varied up to tenfold depending on the database used. Finally, assembling reads into longer contigs provided significant advantages in terms of functional annotation yields.ConclusionsThis study contributes to identify host- and database-specific biases which need to be taken into account in a metaproteomic experiment, providing meaningful insights on how to design gut microbiota studies and to perform metaproteomic data analysis. In particular, the use of multiple databases and annotation tools has to be encouraged, even though this requires appropriate bioinformatic resources.Electronic supplementary materialThe online version of this article (doi:10.1186/s40168-016-0196-8) contains supplementary material, which is available to authorized users.

Highlights

  • Elucidating the role of gut microbiota in physiological and pathological processes has recently emerged as a key research aim in life sciences

  • We describe a systematic evaluation of the impact of sequence DBs on gut metaproteomic results, providing useful insights on how to design gut microbiota studies

  • 12 were experimental metagenomic DBs, differing based on common sequencing and sequence processing variables, namely sequencing depth, read assembly level, and open reading frame (ORF) finding method; three were “pseudo-metagenomes”, containing UniProt sequences taxonomically filtered based on 16S-rDNA sequencing data at thee different taxonomic levels; four were sequencingindependent DBs, with the first comprising the entire set of bacterial sequences deposited in UniProt (UP-B) at the time of the analysis, and the remaining three constructed according to a previously described taxonomy-based proteomic iterative (PI) strategy [15, 20]

Read more

Summary

Introduction

Elucidating the role of gut microbiota in physiological and pathological processes has recently emerged as a key research aim in life sciences. In this respect, metaproteomics, the study of the whole protein complement of a microbial community, can provide a unique contribution by revealing which functions are being expressed by specific microbial taxa. Its wide application to gut microbiota research has been hindered by challenges in data analysis, especially related to the choice of the proper sequence databases for protein identification. Cross-species identification is often possible, owing to the high sequence similarity among orthologous genes from closely related microorganisms [12], but a single amino acid change is sufficient to hamper peptide-to-spectrum matching, making protein identification impossible. An optimized and standardized mass spectrometry (MS) data analysis pipeline for metaproteomics is not yet available

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call