Abstract

BackgroundA few recent large efforts significantly expanded the collection of human-associated bacterial genomes, which now contains thousands of entities including reference complete/draft genomes and metagenome assembled genomes (MAGs). These genomes provide useful resource for studying the functionality of the human-associated microbiome and their relationship with human health and diseases. One application of these genomes is to provide a universal reference for database search in metaproteomic studies, when matched metagenomic/metatranscriptomic data are unavailable. However, a greater collection of reference genomes may not necessarily result in better peptide/protein identification because the increase of search space often leads to fewer spectrum-peptide matches, not to mention the drastic increase of computation time.-eZXE1L8R6JnTVoLWAe_NTVideo MethodsHere, we present a new approach that uses two steps to optimize the use of the reference genomes and MAGs as the universal reference for human gut metaproteomic MS/MS data analysis. The first step is to use only the high-abundance proteins (HAPs) (i.e., ribosomal proteins and elongation factors) for metaproteomic MS/MS database search and, based on the identification results, to derive the taxonomic composition of the underlying microbial community. The second step is to expand the search database by including all proteins from identified abundant species. We call our approach HAPiID (HAPs guided metaproteomics IDentification).ResultsWe tested our approach using human gut metaproteomic datasets from a previous study and compared it to the state-of-the-art reference database search method MetaPro-IQ for metaproteomic identification in studying human gut microbiota. Our results show that our two-steps method not only performed significantly faster but also was able to identify more peptides. We further demonstrated the application of HAPiID to revealing protein profiles of individual human-associated bacterial species, one or a few species at a time, using metaproteomic data.ConclusionsThe HAP guided profiling approach presents a novel effective way for constructing target database for metaproteomic data analysis. The HAPiID pipeline built upon this approach provides a universal tool for analyzing human gut-associated metaproteomic data.

Highlights

  • Culture independent studies of microbial communities associated with different environments are promoted by two main reasons: significance of these communities to their environment/host, and the rapid advancements in sequencing technologies [1,2,3,4]

  • Taking advantage of the recent expansion of the human gut microbial genomes [34, 35], we developed a new two-step approach for human gut metaproteomics data analysis, using over 3000 reference genomes and Metagenome assembled genomes (MAG) by first profiling microbial communities based on the spectral search against a database of high-abundance proteins (HAPs) encoded by these genomes

  • While this paper focus on human gut metaproteomic data analysis, Highly abundant protein-guided metaproteomic Identification (HAPiID) can be customized for analyzing metaproteomic data associated with other environments or hosts

Read more

Summary

Introduction

Culture independent studies of microbial communities associated with different environments are promoted by two main reasons: significance of these communities to their environment/host, and the rapid advancements in sequencing technologies [1,2,3,4]. It has been shown that the genetic makeup and the diet of the host have direct impacts on the composition of the gut bacteria, while in the meantime the latter regulating digestive and metabolic (and beyond) processes of the host, creating a symbiotic relationships between the two [15,16,17] Improvements of both the experimental techniques (e.g., sequencing technology and sample collection [18]) and computational methods (such as those for binning and assembly [19]) have accelerated the microbiome research. A few recent large efforts significantly expanded the collection of human-associated bacterial genomes, which contains thousands of entities including reference complete/draft genomes and metagenome assembled genomes (MAGs) These genomes provide useful resource for studying the functionality of the human-associated microbiome and their relationship with human health and diseases. A greater collection of reference genomes may not necessarily result in better peptide/protein identification because the increase of search space often leads to fewer spectrum-peptide matches, not to mention the drastic increase of computation time

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.