Protein complexes are major components of cellular organization. Based on large-scale protein complex data, we present the first statistical procedure to find insightful substructures in protein complexes: we identify protein subcomplexes (SCs), i.e., multiprotein assemblies residing in different protein complexes. Four protein complex datasets with different origins and variable reliability are separately analyzed. Our method identifies well-characterized protein assemblies with known functions, thereby confirming the utility of the procedure. In addition, we also identify hitherto unknown functional entities consisting of either functionally unknown proteins or proteins with different functional annotation. We show that SCs represent more reliable protein assemblies than the original complexes. Finally, we demonstrate unique properties of subcomplex proteins that underline the distinct roles of SCs: (i) SCs are functionally and spatially more homogeneous than complete protein complexes (this fact is utilized to predict functional roles and subcellular localizations for so far unannotated proteins); (ii) the abundance of subcomplex proteins is less variable than the abundance of other proteins; (iii) SCs are enriched with essential and synthetic lethal proteins; and (iv) mutations in SC-proteins have higher fitness effects than mutations in other proteins.
Read full abstract