Combinatorial Approach for Large-scale Identification of Linked Peptides from Tandem Mass Spectrometry Spectra

Jian Wang,Veronica G Anania,Jeff Knott,John Rush,Jennie R Lill,Philip E Bourne,Nuno Bandeira

doi:10.1074/mcp.m113.035758

Abstract

The combination of chemical cross-linking and mass spectrometry has recently been shown to constitute a powerful tool for studying protein-protein interactions and elucidating the structure of large protein complexes. However, computational methods for interpreting the complex MS/MS spectra from linked peptides are still in their infancy, making the high-throughput application of this approach largely impractical. Because of the lack of large annotated datasets, most current approaches do not capture the specific fragmentation patterns of linked peptides and therefore are not optimal for the identification of cross-linked peptides. Here we propose a generic approach to address this problem and demonstrate it using disulfide-bridged peptide libraries to (i) efficiently generate large mass spectral reference data for linked peptides at a low cost and (ii) automatically train an algorithm that can efficiently and accurately identify linked peptides from MS/MS spectra. We show that using this approach we were able to identify thousands of MS/MS spectra from disulfide-bridged peptides through comparison with proteome-scale sequence databases and significantly improve the sensitivity of cross-linked peptide identification. This allowed us to identify 60% more direct pairwise interactions between the protein subunits in the 20S proteasome complex than existing tools on cross-linking studies of the proteasome complexes. The basic framework of this approach and the MS/MS reference dataset generated should be valuable resources for the future development of new tools for the identification of linked peptides.

Highlights

The study of protein–protein interactions is crucial to understanding how cellular systems function because proteins act in concert through a highly organized set of interactions
Employing disulfidebridged peptides as an example, we propose a novel method that uses a combinatorial peptide library to (a) efficiently generate a large mass spectral reference dataset for linked peptides and (b) use these data to automatically train our new algorithm, MXDB, which can efficiently and accurately identify linked peptides from MS/MS spectra
Chemical cross-linking followed by tandem mass spectrometry is a versatile strategy for the analysis of protein structures and protein–protein interactions

Summary

Introduction

The study of protein–protein interactions is crucial to understanding how cellular systems function because proteins act in concert through a highly organized set of interactions. In the past several years numerous high-throughput studies have pioneered the systematic characterization of protein–protein interactions in model organisms [2,3,4] Such studies mainly utilize two techniques: the yeast two-hybrid system, which aims at identifying binary interactions [5], and affinity purification combined with tandem mass spectrometry analysis for the identification of multi-protein assemblies (6 – 8). Together these led to a rapid expansion of known protein–protein interactions in human and other model organisms. Employing disulfidebridged peptides as an example, we propose a novel method that uses a combinatorial peptide library to (a) efficiently generate a large mass spectral reference dataset for linked peptides and (b) use these data to automatically train our new algorithm, MXDB, which can efficiently and accurately identify linked peptides from MS/MS spectra

Methods

Results

Conclusion