A Statistically Efficient and Scalable Method for Exploratory Analysis of High-Dimensional Data

Mohammad S Rahman,Gholamreza Haffari

doi:10.1007/s42979-020-0064-2

Abstract

Discovering associations among variables is an important data mining task. The associations can be considered as statistical dependencies among random variables, expressed as the structure of an underlying probabilistic graphical model. Current methods for graphical model structure discovery either do not scale well to datasets with large sample sizes, or suffer from high false discovery rates when the number of dimensions is much larger than the sample size. In this paper, we propose a scalable and statistically efficient approach for graphical model structure discovery for multivariate data involving continuous variables. Our approach uses a minimum message length (MML)-based objective, for which we design a greedy algorithm where the best edges maximising improvements to the MML-based score are added incrementally to the graphical model. We present extensive empirical results on synthetic data with different sample, variable, clique and inverse correlation coefficient and show that our method outperforms strong baselines in terms of both speed and the accuracy of the predicted associations among the random variables in the graphical model. We also report that our method performs significantly very well in AML, BRCA cancer data and other real-life datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Statistically Efficient and Scalable Method for Exploratory Analysis of High-Dimensional Data

Abstract

Talk to us

Similar Papers

More From: SN Computer Science

Lead the way for us

Journal: SN Computer Science	Publication Date: Feb 7, 2020
Citations: 2

Similar Papers

Inferring Two-Level Hierarchical Gaussian Graphical Models to Discover Shared and Context-Specific Conditional Dependencies from High-Dimensional Heterogeneous Data
Mohammad S Rahman ... Ann E Nicholson
SN Computer Science | VOL. 1
Mohammad S Rahman, et. al.Mohammad S Rahman ... Ann E Nicholson
27 Jun 2020
SN Computer Science | VOL. 1

Simultaneous estimation and clustering with finite mixture of nonparanormal graphical models
Hamid Haji Aghabozorgi ... Farzad Eskandari
Communications in Statistics - Simulation and Computation | VOL. ahead-of-print
Hamid Haji Aghabozorgi, et. al.Hamid Haji Aghabozorgi ... Farzad Eskandari
30 Sep 2023
Communications in Statistics - Simulation and Computation | VOL. ahead-of-print

Complex Normal Graphical Models
H H Andersen ... D Sørensen
-
H H Andersen, et. al.H H Andersen ... D Sørensen
01 Jan 1995
01 Jan 1995

Overlapping Decomposition for Gaussian Graphical Modeling
Guojie Song ... Kunqing Xie
IEEE Transactions on Knowledge and Data Engineering | VOL. 27
Guojie Song, et. al.Guojie Song ... Kunqing Xie
01 Aug 2015
IEEE Transactions on Knowledge and Data Engineering | VOL. 27

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Statistically Efficient and Scalable Method for Exploratory Analysis of High-Dimensional Data

Abstract

Talk to us

Similar Papers

More From: SN Computer Science