Epistatic miniarrary profile (EMAP) studies have enabled the mapping of large-scale genetic interaction networks and generated large amounts of data in model organisms. It provides an incredible set of molecular tools and advanced technologies that should be efficiently understanding the relationship between the genotypes and phenotypes of individuals. However, the network information gained from EMAP cannot be fully exploited using the traditional statistical network models. Because the genetic network is always heterogeneous, for example, the network structure features for one subset of nodes are different from those of the left nodes. Exponential-family random graph models (ERGMs) are a family of statistical models, which provide a principled and flexible way to describe the structural features (e.g., the density, centrality, and assortativity) of an observed network. However, the single ERGM is not enough to capture this heterogeneity of networks. In this paper, we consider a mixture ERGM (MixtureEGRM) networks, which model a network with several communities, where each community is described by a single EGRM. EM algorithm is a classical method to solve the mixture problem, however, it will be very slow when the data size is huge in the numerous applications. We adopt an efficient novel online graph clustering algorithm to classify the graph nodes and estimate the ERGM parameters for the MixtureERGM. In comparison studies, the MixtureERGM outperforms the role analysis for the network cluster in which the mixture of exponential-family random graph model is developed for many ego-network according to their roles. One genetic interaction network of yeast and two real social networks (provided as supplemental materials, which can be found on the Computer Society Digital Library at http://doi.ieeecomputersociety.org/10.1109/TCBB.2017.2743711) show the wide potential application of the MixtureERGM.
Read full abstract