A network of modular protein complexes inside a cell coordinates many biological processes and is known as protein-protein interaction (PPI) network. A PPI network can be modeled as a graph, in which edges represent interactions among proteins, and sub graphs represent protein complexes. Previous methods for protein complex mining from PPI network mainly focused on few topological features like density and degree statistics based on the assumption that proteins inside a complex are highly interactive with each other and thus form dense subgraphs. While this assumption is true for some complexes, it doesn’t hold for many others. The important biological information within the protein amino acid sequences, which estimates the interacting property among two proteins for performing a specific biological function is not considered in most of the previous studies. There is a need for algorithms that consider both topological and biological features for correctly identifying protein complexes having varying topological structures and biological patterns inside a PPI network. In this paper, we present an algorithm for detecting protein complexes from interaction graphs. By using graph topological patterns and biological properties as features, we model each complex sub graph by decision tree learners. We use a training set of known complexes to construct decision trees in depth first and BEST FIRST manner using divide and conquer strategy. Splitting criterion, such as information and Gini gain are used in tree expansion process. Training set is divided into subsets and each subset is represented as a branch of tree. Pruning techniques are used to reduce the size of tree. We applied our method to protein interaction data in yeast on two benchmark data sets, i.e., MIPS and CYC2008. According to our results, decision trees achieved a considerable improvement over clique-based algorithms in terms of its ability to recover known complexes by using integrated biological and topological properties.
Read full abstract