Abstract
Increasing numbers of protein interactions have been identified in high-throughput experiments, but only a small proportion have solved structures. Recently, sequence coevolution-based approaches have led to a breakthrough in predicting monomer protein structures and protein interaction interfaces. Here, we address the challenges of large-scale interaction prediction at residue resolution with a fast alignment concatenation method and a probabilistic score for the interaction of residues. Importantly, this method (EVcomplex2) is able to assess the likelihood of a protein interaction, as we show here applied to large-scale experimental datasets where the pairwise interactions are unknown. We predict 504 interactions de novo in the E. coli membrane proteome, including 243 that are newly discovered. While EVcomplex2 does not require available structures, coevolving residue pairs can be used to produce structural models of protein interactions, as done here for membrane complexes including the Flagellar Hook-Filament Junction and the Tol/Pal complex.
Highlights
Increasing numbers of protein interactions have been identified in high-throughput experiments, but only a small proportion have solved structures
The majority of the protein monomers in the E. coli proteome (3,189 out of a total of 4,391) have high-quality monomer alignments and are amenable to EVcomplex[2] (Methods). We verify that these alignments are of high quality by testing the precision of the top evolutionary couplings (ECs) for those monomers with an experimental structure, finding that 78% have reasonable precision of the top ECs (60% for the top L ECs, where L is the protein sequence length) (Supplementary Data 1, Supplementary Fig. 1)
We find that 42% of proteins in the human proteome can be aligned with medium sequence diversity in at least one domain, and 20% of proteins can be aligned with the high diversity cutoff used for E. coli in all of their domains
Summary
Increasing numbers of protein interactions have been identified in high-throughput experiments, but only a small proportion have solved structures. There have been many experimental[5,6,7] and computational methods[8,9,10] to identify which proteins interact within an organism to scale, but the only computational methods able to determine both interactions and their precise, residue-resolution interfaces are based on coevolution Coevolutionary methods such as EVcouplings[11,12] and others[13] have been successful in determining 3D structures by leveraging the vast corpus of natural sequences using probabilistic graphical models to infer candidate pairs of interacting residues. To demonstrate the potential for eukaryotic complexes, we show successful predictions for eukaryotic-exclusive complexes including the human spliceosome
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.