Abstract
Crosslinking mass spectrometry (XL-MS) is becoming an increasingly popular technique for modeling protein monomers and complexes. The distance restraints garnered from these experiments can be used alone or as part of an integrative modeling approach, incorporating data from many sources. However, modeling practices are varied and the difference in their usefulness is not clear. Here, we develop a new scoring procedure for models based on crosslink data—Matched and Nonaccessible Crosslink score (MNXL). We compare its performance with that of other commonly-used scoring functions (Number of Violations and Sum of Violation Distances) on a benchmark of 14 protein domains, each with 300 corresponding models (at various levels of quality) and associated, previously published, experimental crosslinks (XLdb). The distances between crosslinked lysines are calculated either as Euclidean distances or Solvent Accessible Surface Distances (SASD) using a newly-developed method (Jwalk). MNXL takes into account whether a crosslink is nonaccessible, i.e. an experimentally observed crosslink has no corresponding SASD in a model due to buried lysines. This metric alone is shown to have a significant impact on modeling performance and is a concept that is not considered at present if only Euclidean distances are used. Additionally, a comparison between modeling with SASD or Euclidean distance shows that SASD is superior, even when factoring out the effect of the nonaccessible crosslinks. Our benchmarking also shows that MNXL outperforms the other tested scoring functions in terms of precision and correlation to Cα-RMSD from the crystal structure. We finally test the MNXL at different levels of crosslink recovery (i.e. the percentage of crosslinks experimentally observed out of all theoretical ones) and set a target recovery of ∼20% after which the performance plateaus.
Highlights
From the ‡Institute of Structural and Molecular Biology, Birkbeck College, University of London, Malet street, London, WC1E 7HX, UK; §Gene Center Munich, Ludwig-Maximilians-Universitat (LMU) Munich, Feodor-Lynen-Strasse 25, 81377 Munich, Germany; ¶Institute of Structural and Molecular Biology, Division of Biosciences, University College London, London WC1E 6BT, UK
We describe our scoring function, matched and non-accessible crosslink score (MNXL), that scores each crosslink gathered from crosslinking experiments based on its calculated Solvent Accessible Surface Distances (SASD)
Recalculating SASDs from XLdb—XLdb is a database of experimentally observed crosslinks that has been curated from the literature (10)
Summary
From the ‡Institute of Structural and Molecular Biology, Birkbeck College, University of London, Malet street, London, WC1E 7HX, UK; §Gene Center Munich, Ludwig-Maximilians-Universitat (LMU) Munich, Feodor-Lynen-Strasse 25, 81377 Munich, Germany; ¶Institute of Structural and Molecular Biology, Division of Biosciences, University College London, London WC1E 6BT, UK. In order to better model proteins using crosslinks, we have developed a new scoring method to evaluate models, which takes advantage of the experimentally observed distribution of distances between crosslinked residues in XLdb, and used it to investigate the effects of using SASD over Euclidean distance.
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have