Non-parametric Bayesian Latent Factor Models for Network Reconstruction

Sikun Yang

doi:10.25534/tuprints-00009695

Abstract

This thesis is concerned with the statistical learning of probabilistic models for graph-structured data. It addresses both the theoretical aspects of network modelling--like the learning of appropriate representations for networks--and the practical difficulties in developing the algorithms to perform inference for the proposed models. The first part of the thesis addresses the problem of discrete-time dynamic network modeling. The objective is to learn the common structure and the underlying interaction dynamics among the entities involved in the observed temporal network. Two probabilistic modeling frameworks are developed. First, a Bayesian nonparametric framework is proposed to capture the static latent community structure and the evolving node-community memberships over time. More specifically, the hierarchical gamma process is utilized to capture the underlying intra-community and inter-community interactions. The appropriate number of latent communities can be automatically estimated via the inherent shrinkage mechanism of the hierarchical gamma process prior. The gamma Markov process are constructed to capture the evolving node-community relations. As the Bernoulli-Poisson link function is used to map the binary edges to the latent parameter space, the proposed method scales with the number of non-zero edges. Hence, the proposed method is particularly well-fitted to model large sparse networks. Moreover, a time-dependent hierarchical gamma process dynamic network model is proposed to capture the birth and death dynamics of the underlying communities. For performance evaluation, the proposed methods are compared with state-of-the-art statistical network models on both synthetic and real-world data. In the second part of the thesis, the main objective is to analyze continuous-time event-based dynamic networks. A fundamental problem in modeling such continuously-generated temporal interaction events data is to capture the reciprocal nature of the interactions among entities--the actions performed by one individual toward another increase the probability that an action of the same type to be returned. Hence, the mutually-exciting Hawkes process is utilized to capture the reciprocity between each pair of individuals involved in the observed dynamic network. In particular, the base rate of the Hawkes process is built upon the latent parameters inferred using the hierarchical gamma process edge partition model, to capture the underlying community structure. Moreover, each interaction event between two individuals is augmented with a pair of latent variables, which will be referred to as latent patterns, to indicate which of their involved communities lead to the occurring of that interaction. Accordingly, the proposed model allows the excitatory effects of each interaction on its opposite direction are determined by its latent patterns. Efficient Gibbs sampling and Expectation Maximization algorithms are developed to perform inference. Finally, the evaluations performed on the real-world data demonstrate the interpretability and competitive performance of the model compared with state-of-the-art methods. In the third part of this thesis, the objective is to analyze the common structure of multiple related data sources under the generative framework. First, a Bayesian nonparametric group factor analysis method is developed to factorize multiple related groups of data into the common latent factor space. The hierarchical beta Bernoulli process is exploited to induce sparsity over the group-specific factor loadings to strengthen the model interpretability. A collapsed variational inference scheme is proposed to perform efficient inference for large-scale data analysis in real-world applications. Moreover, a Poisson gamma memberships framework is investigated for joint modelling of network and related node features.

Full Text