A Graph-Centric Approach for Metagenome-Guided Peptide and Protein Identification in Metaproteomics.

Oliver Serang,Haixu Tang,Yuzhen Ye,Sujun Li

doi:10.1371/journal.pcbi.1005224

Oliver Serang, Haixu Tang + Show 2 more

Open Access

https://doi.org/10.1371/journal.pcbi.1005224

Copy DOI

Journal: PLoS computational biology	Publication Date: Dec 5, 2016
Citations: 36	License type: CC BY 4.0

Affiliation: Indiana University Bloomington

Abstract

Metaproteomic studies adopt the common bottom-up proteomics approach to investigate the protein composition and the dynamics of protein expression in microbial communities. When matched metagenomic and/or metatranscriptomic data of the microbial communities are available, metaproteomic data analyses often employ a metagenome-guided approach, in which complete or fragmental protein-coding genes are first directly predicted from metagenomic (and/or metatranscriptomic) sequences or from their assemblies, and the resulting protein sequences are then used as the reference database for peptide/protein identification from MS/MS spectra. This approach is often limited because protein coding genes predicted from metagenomes are incomplete and fragmental. In this paper, we present a graph-centric approach to improving metagenome-guided peptide and protein identification in metaproteomics. Our method exploits the de Bruijn graph structure reported by metagenome assembly algorithms to generate a comprehensive database of protein sequences encoded in the community. We tested our method using several public metaproteomic datasets with matched metagenomic and metatranscriptomic sequencing data acquired from complex microbial communities in a biological wastewater treatment plant. The results showed that many more peptides and proteins can be identified when assembly graphs were utilized, improving the characterization of the proteins expressed in the microbial communities. The additional proteins we identified contribute to the characterization of important pathways such as those involved in degradation of chemical hazards. Our tools are released as open-source software on github at https://github.com/COL-IU/Graph2Pro.

Highlights

Microbiome studies have produced massive metagenomic data, and more recently other metaomics including metatranscriptomic and metaproteomic data [1]
Meta-omic techniques have been adopted as complementary approaches to metagenomic sequencing to study functional characteristics and dynamics of microbial communities, aiming at a holistic understanding of a community to respond to the changes in the environment
A Graph-Centric Approach for Improving Identifications in Metaproteomics employed a metagenome-guided approach, in which complete or fragmental protein-coding genes were first predicted from metagenomic sequences, acquired from the matched community samples, and predicted protein sequences were used in peptide identification

Summary

Introduction

Microbiome studies have produced massive metagenomic data, and more recently other metaomics including metatranscriptomic and metaproteomic data [1]. Many peptide search engines have been developed for this purpose in the proteomics field, including commonly used tools such as Mascot [25], Sequest [26], X!Tandem [27], InSPEct [28] and MSGF+ [29] Their applications in metaproteomics rely on the pre-assembly of a protein database. Metaproteomic studies used the collection of proteins encoded by fully sequenced bacterial genomes that likely live in the environment (e.g., human gut [11]) as the target database This collection may be largely incomplete, e.g., a large fraction (10%-34%) of genes from HMP [30] or MetaHIT [31] shotgun sequencing are completely novel [6]. The target peptides collected in this manner may miss many full-length tryptic peptides that are potentially observed in the metaproteomic experiments

Methods

Discussion

Conclusion