Abstract

Determining the three dimensional arrangement of proteins in a complex is highly beneficial for uncovering mechanistic function and interpreting genetic variation in coding genes comprising protein complexes. There are several methods for determining co-complex interactions between proteins, among them co-fractionation / mass spectrometry (CF-MS), but it remains difficult to identify directly contacting subunits within a multi-protein complex. Correlation analysis of CF-MS profiles shows promise in detecting protein complexes as a whole but is limited in its ability to infer direct physical contacts among proteins in sub-complexes. To identify direct protein-protein contacts within human protein complexes we learn a sparse conditional dependency graph from approximately 3,000 CF-MS experiments on human cell lines. We show substantial performance gains in estimating direct interactions compared to correlation analysis on a benchmark of large protein complexes with solved three-dimensional structures. We demonstrate the method’s value in determining the three dimensional arrangement of proteins by making predictions for complexes without known structure (the exocyst and tRNA multi-synthetase complex) and by establishing evidence for the structural position of a recently discovered component of the core human EKC/KEOPS complex, GON7/C14ORF142, providing a more complete 3D model of the complex. Direct contact prediction provides easily calculable additional structural information for large-scale protein complex mapping studies and should be broadly applicable across organisms as more CF-MS datasets become available.

Highlights

  • Many proteins assemble into large macromolecular complexes with essential cellular functions

  • Conserved protein complexes are estimated to number in the thousands but the vast majority of these are structurally elusive by traditional structural biology techniques

  • Advances in proteomics technologies have allowed for the high throughput identification of protein complexes across the tree of life including large-scale affinity purification mass spectrometry (AP-MS) datasets [1,2,3] as well as high-throughput co-fractionation mass spectrometry (CF-MS) datasets comprising thousands of experiments across human, metazoan and prokaryotes [4,5,6,7]

Read more

Summary

Introduction

Many proteins assemble into large macromolecular complexes with essential cellular functions. In the CF-MS approach, cellular lysate is biochemically fractionated by multiple, non-denaturing chromatographic methods and complexes are inferred bioinformatically in a machine-learning framework using correlations of the resulting protein elution profiles as a prominent feature. This approach has primarily been used to identify component subunits of complexes, we previously observed that the correlation structure of the protein elution profiles revealed structural information about the complexes [6]. Other computational approaches have been proposed to identify direct contacts by analyzing cooccurrence of proteins in mass spectrometry experiments but they have only been applied to AP-MS datasets [15]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call