Modeling Complex Diseases Using Discriminative Network Fragments

Ambuj K Singh

doi:10.1007/978-3-642-30191-9_23

Abstract

A number of complex diseases are network-based where a set of pathways needs to be perturbed. Understanding the logic of such perturbations from high-throughput datasets as next-generation sequencing or gene expression can help in elucidating the nature of diseases and their multiple states. I will discuss a recent approach that mines multiple annotated networks to find the small network fragments that drive the state of the entire network. In this approach, a gene/protein interaction network is used as the underlying structure and high throughput data is used to annotate the nodes of the network. The global state of a network is annotated as normal or diseased. Mining discriminative subgraphs from large networks is a powerful mechanism for identifying network components that are influential in determining the global network state. It is different from learning classification models in a number of ways. The first difference is in semantics. A traditional classifier operates on unstructured data where each feature represents an axis in a high-dimensional space. In our problem, each feature (or annotated node in a graph) is constrained within a structure and the network event being modeled evolves through the global structure. A traditional classifier only analyzes the statistical significance of a feature and ignores the underlying structure. Second, in our problem, the goal is to mine discriminative subgraphs, each of which aids in predicting the global network state. The proposed technique operates at a level of abstraction of discriminative subgraphs instead of individual nodes. To achieve the desirable properties highlighted above, we design a technique for mining network-constrained decision trees that learn network-encoded logic functions to predict the global network state. To tackle the exponential subgraph search space, we formulate the idea of an Edit Map, on which we perform Metropolis-Hastings sampling to drastically reduce the computation cost. We have performed extensive experiments to evaluate the efficiency and effectiveness of our method. Our results show that the proposed algorithm achieves an accurate approximation of the optimal answer set. Furthermore, the method outperforms the current state-of-the-art classifiers developed for gene/protein interaction networks.

Full Text