Abstract

Graphical models, which can be viewed as a marriage of graph theory and probability theory, provide a powerful formalism for multivariate statistical modeling of complex systems. Graphical models harness the complexity of large-scale systems by representing the statistical relations among a large number of variables in a compact manner. This compact structure can in turn be leveraged to derive highly efficient techniques for data analysis. However, research on graphical models for continuous variables so far mostly focuses on Gaussian statistics. Unfortunately, this limitation severely handicaps the utility of graphical models in real-world applications that are often associated with non-Gaussian variables. Physics and earth sciences, for instance, are often characterized by all positive quantities (e.g., amplitude, energy and magnitude), and thus cannot be described accurately by Gaussian distributions. In addition, the behavior of extreme events, such as hurricanes and floods, are theoretically governed by extreme-value distributions with fat tails instead of by Gaussian distributions. In this thesis, we move beyond Gaussian graphical models, and propose a portfolio of novel graphical models for non-Gaussian data. Such graphical models are powerful tools to solve real-life inference problems, while avoiding restrictive assumptions of Gaussian statistics, hence yielding more reliable solutions. The first part of the thesis copes with ``nominal'' (non-extremal) data. This type of data follows neither Gaussian nor fat-tailed distributions. Gaussian copulas are employed here to tie any kind of marginal distributions (Gaussian, non-Gaussian and even non-parametric) together to form a joint distribution. Through the language of graphical models, we further impose constraints of sparse dependence structure on the resulting non-Gaussian distribution, leading to sparse copula Gaussian graphical models (CGGM). Such models have the same mathematical convenience of Gaussian graphical models, yet are applicable to marginally non-Gaussian data. Along this line, we proceed to construct hidden variable copula Gaussian graphical models (HVCGGM) and discrete copula Gaussian graphical models (DCGGM). The two models are applicable to different practical scenarios. Specifically, the HVCGGM yields sparse graphical models when data is unavailable for some relevant variables; the DCGGM extends CGGM to discrete data in a straightforward manner. Since real data are often non-stationary and statistical models designed for stationary data may not yield accurate results, we further consider learning graphical models for piecewise-stationary data. In other words, we first detect change points in the time series, and then infer graphical models within each stationary segment. Besides modeling nominal data, we also build graphical models to handle extreme events. Extreme events are often modeled in two stages: first the extreme-value marginal distributions are estimated, and then the joint distribution of extreme values is constructed based…

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call