Topological and compositional complexity of protein-protein networks is assessed in a variety of ways making use of graph theory and information theory. The methodology used is borrowed from mathematical chemistry and includes complexity descriptors such as substructure count, overall connectivity, walk count, and information on various vertex distributions. The approach is applied to the (incomplete) proteome of Saccharomyces cerevisiae containing 232 protein complexes of a total of 1,440 proteins. The proteome network and each of its nine functional subsets of protein complexes are disconnected graphs, containing a number of noninteracting species and a major component. A weighted edge between two vertices in these graphs stands for the number of shared proteins between the respective complexes. The major component is a highly connected, 'small-world' network, in which the average vertex distance between protein complexes does not exceed 2.2 (2.4 for the entire proteome), whereas the maximum distance does not exceed 4 (or 5 for the proteome). The vertex degree distribution in the major proteome component with 199 complexes follows the power law P(k) approximately k(-gamma), with gamma approximately = 1.7. The analysis of the functional organization of the yeast proteome has shown that, for any pair of biological functions, there always exist many proteins that can perform both functions. The potential application of the quantitative proteome descriptors discussed includes quantitative relationships between the structure and biological action of dynamic protein complexes in changing environment, identification of targets for markers/drugs, as well as system analysis and comparative studies of proteomes.
Read full abstract