Abstract

Chemical cross-linking mass spectrometry identifies interacting surfaces within a protein assembly through labeling with bifunctional reagents and identifying the covalently modified peptides. These yield distance constraints that provide a powerful means to model the three-dimensional structure of the assembly. Bioinformatic analysis of cross-linked data resulting from large protein assemblies is challenging because each cross-linked product contains two covalently linked peptides, each of which must be correctly identified from a complex matrix of potential confounders. Protein Prospector addresses these issues through a complementary mass modification strategy in which each peptide is searched and identified separately. We demonstrate this strategy with an analysis of RNA polymerase II. False discovery rates (FDRs) are assessed via comparison of cross-linking data to crystal structure, as well as by using a decoy database strategy. Parameters that are most useful for positive identification of cross-linked spectra are explored. We find that fragmentation spectra generally contain more product ions from one of the two peptides constituting the cross-link. Hence, metrics reflecting the quality of the spectral match to the less confident peptide provide the most discriminatory power between correct and incorrect matches. A support vector machine model was built to further improve classification of cross-linked peptide hits. Furthermore, the frequency with which peptides cross-linked via common acylating reagents fragment to produce diagnostic, cross-linker-specific ions is assessed. The threshold for successful identification of the cross-linked peptide product depends upon the complexity of the sample under investigation. Protein Prospector, by focusing the reliability assessment on the least confident peptide, is better able to control the FDR for results as larger complexes and databases are analyzed. In addition, when FDR thresholds are calculated separately for intraprotein and interprotein results, a further improvement in the number of unique cross-links confidently identified is achieved. These improvements are demonstrated on two previously published cross-linking datasets.

Highlights

  • Protein Prospector addresses these issues through a complementary mass modification strategy in which each peptide is searched and identified separately

  • We demonstrate this strategy with an analysis of RNA polymerase II

  • This approach has recently been applied to modeling the RNA Pol II preinitiation complex [4], several chromatin remodeling complexes [5, 6], the 26S proteasome [7], and the Mediator middle module [8]; solving the subunit arrangement of TCP1 ring complex [9, 10]; modeling the electron density map of the Mediator head module [11]; and investigating the binding sites of ribosomal protein S1 to the 30S ribosome [12] and the general transcription factor TFIIF to RNA polymerase II [13]

Read more

Summary

EXPERIMENTAL PROCEDURES

Cross-linking of Pol II—RNA pol II was purified from Saccharomyces cerevisiae as previously described [11]. 60 ␮g of pol II The current peak is removed and the look-back step is repeated Using this process, Protein Prospector will match fragments of any charge state up to that of the precursor ion, provided it is possible to determine the charge of the peak. A second set of searches were performed against a list of protein accession numbers identified in the sample on the basis of unmodified peptides, plus sequence-randomized versions of these entries, for a total of 1512 entries. The classification score used in this analysis was S.D. Ϫ pep2.pExp, where S.D. is the difference in score between the crosslinked result and the best match to a single (non-cross-linked) peptide, and pep2.pExp is the log of the expectation value of the least confident peptide identification. Results were combined using Prospector’s Search Compare program, followed by removal of the lower confidence match for a particular spectrum when the two searches both produced an assignment

RESULTS
PLa PLKa
NA NA NA NA NA NA
Score Difference
PP pLink
DISCUSSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call