Abstract
Large network, as a form of big data, has received increasing amount of attention in data science, especially for large social network, which is reaching the size of hundreds of millions, with daily interactions on the scale of billions. Thus analyzing and modeling these data to understand the connectivities and dynamics of large networks is important in a wide range of scientific fields. Among popular models, exponential random graph models (ERGMs) have been developed to study these complex networks by directly modeling network structures and features. ERGMs, however, are hard to scale to large networks because maximum likelihood estimation of parameters in these models can be very difficult, due to the unknown normalizing constant. Alternative strategies based on Markov chain Monte Carlo (MCMC) draw samples to approximate the likelihood, which is then maximized to obtain the maximum likelihood estimators (MLE). These strategies have poor convergence due to model degeneracy issues and cannot be used on large networks. Chatterjee et al. (Ann Stat 41:2428–2461, 2013) propose a new theoretical framework for estimating the parameters of ERGMs by approximating the normalizing constant using the emerging tools in graph theory—graph limits. In this paper, we construct a complete computational procedure built upon their results with practical innovations which is fast and is able to scale to large networks. More specifically, we evaluate the likelihood via simple function approximation of the corresponding ERGM’s graph limit and iteratively maximize the likelihood to obtain the MLE. We also discuss the methods of conducting likelihood ratio test for ERGMs as well as related issues. Through simulation studies and real data analysis of two large social networks, we show that our new method outperforms the MCMC-based method, especially when the network size is large (more than 100 nodes). One limitation of our approach, inherited from the limitation of the result of Chatterjee et al. (Ann Stat 41:2428–2461, 2013), is that it works only for sequences of graphs with a positive limiting density, i.e., dense graphs.
Highlights
There has been growing interest in applying exponential random graph models to social network analysis
We specify the true value of the parameters h to be h 1⁄4 ðÀ2; À1; 1Þ; which is obtained by rounding parameter estimates of this exponential random graph models (ERGMs) fitted to a small Facebook social network data
Motivated by the latest developments of graph limits theory, Chatterjee et al (2013) propose a theoretical framework for estimating ERGMs based on a large-deviation approximation to the normalizing constant
Summary
There has been growing interest in applying exponential random graph models ( known as pà models) to social network analysis (see Frank and Strauss 1986; Robins et al 2007; Handcock and Gile 2010). Parameter estimation of ERGMs for large networks remains a challenging problem. This is due to the fact that the normalizing constant in the likelihood function depends on the parameters of interest and is a summation over all possible graphs of n nodes. A social network can be represented by a graph, in which nodes typically represent individuals and ties (or edges) represent a specified relationship of interest between individuals, such as friendship. Let Gn be the space of all simple graphs G on n nodes, where simple graphs are undirected graphs with no loops or multiple edges. Given a set of k features UðGÞ 1⁄4 ðU1ðGÞ; . . .; UkðGÞÞ and a vector of real-valued parameters h 1⁄4 ðh1; . . .; hkÞ, the exponential random graph model (ERGM) assumes that G follows a probability distribution in the following exponential form:
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.