Abstract

BackgroundMass spectrometry-based shotgun proteomics experiments rely on accurate matching of experimental spectra against a database of protein sequences. Existing computational analysis methods are limited in the size of their sequence databases, which severely restricts the proteomic sequencing depth and functional analysis of highly complex samples. The growing amount of public high-throughput sequencing data will only exacerbate this problem. We designed a broadly applicable metaproteomic analysis method (ComPIL) that addresses protein database size limitations.ResultsOur approach to overcome this significant limitation in metaproteomics was to design a scalable set of sequence databases assembled for optimal library querying speeds. ComPIL was integrated with a modified version of the search engine ProLuCID (termed “Blazmass”) to permit rapid matching of experimental spectra. Proof-of-principle analysis of human HEK293 lysate with a ComPIL database derived from high-quality genomic libraries was able to detect nearly all of the same peptides as a search with a human database (~500x fewer peptides in the database), with a small reduction in sensitivity. We were also able to detect proteins from the adenovirus used to immortalize these cells. We applied our method to a set of healthy human gut microbiome proteomic samples and showed a substantial increase in the number of identified peptides and proteins compared to previous metaproteomic analyses, while retaining a high degree of protein identification accuracy and allowing for a more in-depth characterization of the functional landscape of the samples.ConclusionsThe combination of ComPIL with Blazmass allows proteomic searches to be performed with database sizes much larger than previously possible. These large database searches can be applied to complex meta-samples with unknown composition or proteomic samples where unexpected proteins may be identified. The protein database, proteomic search engine, and the proteomic data files for the 5 microbiome samples characterized and discussed herein are open source and available for use and additional analysis.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-016-2855-3) contains supplementary material, which is available to authorized users.

Highlights

  • Mass spectrometry-based shotgun proteomics experiments rely on accurate matching of experimental spectra against a database of protein sequences

  • Organizing large amounts of protein information into optimized search databases To account for the immense size of a comprehensive protein database and optimize for the retrieval of information needed for efficient peptide-spectrum scoring, we organized our protein data into three distributed NoSQL databases (Fig. 1a) implemented using MongoDB

  • A comprehensive protein identification library (ComPIL) database was generated by amassing protein sequence data from a number of large, public sequencing projects (Fig. 1b, Additional file 1: Methods and Table S1)

Read more

Summary

Introduction

Mass spectrometry-based shotgun proteomics experiments rely on accurate matching of experimental spectra against a database of protein sequences. Peptide candidates for each mass spectrum are selected from this protein sequence database and scored against experimental MS/MS data. In this approach, high-scoring peptide candidate matches are chosen as Chatterjee et al BMC Genomics (2016) 17:642 peptide identifications for spectra after rigorous statistical filtering and post-processing [6,7,8]. High-scoring peptide candidate matches are chosen as Chatterjee et al BMC Genomics (2016) 17:642 peptide identifications for spectra after rigorous statistical filtering and post-processing [6,7,8] For each of these analysis tools, a peptide sequence must be present in the chosen sequence database for it to be identified in a biological sample. This study underscores the value of searching against a comprehensive database

Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.