A general framework for association analysis of heterogeneous data

Gen Li,Irina Gaynanova

doi:10.1214/17-aoas1127

Abstract

Multivariate association analysis is of primary interest in many applications. Despite the prevalence of high-dimensional and non-Gaussian data (such as count-valued or binary), most existing methods only apply to low-dimensional data with continuous measurements. Motivated by the Computer Audition Lab 500-song (CAL500) music annotation study, we develop a new framework for the association analysis of two sets of high-dimensional and heterogeneous (continuous/binary/count) data. We model heterogeneous random variables using exponential family distributions, and exploit a structured decomposition of the underlying natural parameter matrices to identify shared and individual patterns for two data sets. We also introduce a new measure of the strength of association, and a permutation-based procedure to test its significance. An alternating iteratively reweighted least squares algorithm is devised for model fitting, and several variants are developed to expedite computation and achieve variable selection. The application to the CAL500 data sheds light on the relationship between acoustic features and semantic annotations, and provides effective means for automatic music annotation and retrieval.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A general framework for association analysis of heterogeneous data

Abstract

Talk to us

Similar Papers

More From: The Annals of Applied Statistics

Lead the way for us

Journal: The Annals of Applied Statistics	Publication Date: Sep 1, 2018
Citations: 25

Similar Papers

Multivariate Procedure for Variable Selection and Classification of High Dimensional Heterogeneous Data
Tahir Mehmood ... Zahid Rasheed
Communications for Statistical Applications and Methods | VOL. 22
Tahir Mehmood, et. al.Tahir Mehmood ... Zahid Rasheed
30 Nov 2015
Communications for Statistical Applications and Methods | VOL. 22

H-D and Subspace Clustering of Paradoxical High Dimensional Clinical Datasets with Dimension Reduction Techniques – a Model
S Rajeswari ... M S Josephine
Indian Journal of Science and Technology | VOL. 9
S Rajeswari, et. al.S Rajeswari ... M S Josephine
19 Oct 2016
Indian Journal of Science and Technology | VOL. 9

Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges
Jörg Rahnenführer ... Eugenia Migliavacca
BMC Medicine | VOL. 21
Jörg Rahnenführer, et. al.Jörg Rahnenführer ... Eugenia Migliavacca
15 May 2023
BMC Medicine | VOL. 21

Multimodal Data Fusion in High-Dimensional Heterogeneous Datasets Via Generative Models
Yasin Yilmaz ... Mehmet Aktukmak
IEEE Transactions on Signal Processing | VOL. 69
Yasin Yilmaz, et. al.Yasin Yilmaz ... Mehmet Aktukmak
01 Jan 2020
IEEE Transactions on Signal Processing | VOL. 69

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A general framework for association analysis of heterogeneous data

Abstract

Talk to us

Similar Papers

More From: The Annals of Applied Statistics