Joint analysis of multiple high-dimensional data types using sparse matrix approximations of rank-1 with applications to ovarian and liver cancer.

Gordon Okimoto,Michael Loomis,Sandi Kwee,Ashkan Zeinalzadeh,Tiphaine Fabre,Owen Chan,Brenda Hernandez,James B Nation,Maarit Tiirikainen,Linda Wong,Tom Wenska

doi:10.1186/s13040-016-0103-7

Abstract

BackgroundTechnological advances enable the cost-effective acquisition of Multi-Modal Data Sets (MMDS) composed of measurements for multiple, high-dimensional data types obtained from a common set of bio-samples. The joint analysis of the data matrices associated with the different data types of a MMDS should provide a more focused view of the biology underlying complex diseases such as cancer that would not be apparent from the analysis of a single data type alone. As multi-modal data rapidly accumulate in research laboratories and public databases such as The Cancer Genome Atlas (TCGA), the translation of such data into clinically actionable knowledge has been slowed by the lack of computational tools capable of analyzing MMDSs. Here, we describe the Joint Analysis of Many Matrices by ITeration (JAMMIT) algorithm that jointly analyzes the data matrices of a MMDS using sparse matrix approximations of rank-1.MethodsThe JAMMIT algorithm jointly approximates an arbitrary number of data matrices by rank-1 outer-products composed of “sparse” left-singular vectors (eigen-arrays) that are unique to each matrix and a right-singular vector (eigen-signal) that is common to all the matrices. The non-zero coefficients of the eigen-arrays identify small subsets of variables for each data type (i.e., signatures) that in aggregate, or individually, best explain a dominant eigen-signal defined on the columns of the data matrices. The approximation is specified by a single “sparsity” parameter that is selected based on false discovery rate estimated by permutation testing. Multiple signals of interest in a given MDDS are sequentially detected and modeled by iterating JAMMIT on “residual” data matrices that result from a given sparse approximation.ResultsWe show that JAMMIT outperforms other joint analysis algorithms in the detection of multiple signatures embedded in simulated MDDS. On real multimodal data for ovarian and liver cancer we show that JAMMIT identified multi-modal signatures that were clinically informative and enriched for cancer-related biology.ConclusionsSparse matrix approximations of rank-1 provide a simple yet effective means of jointly reducing multiple, big data types to a small subset of variables that characterize important clinical and/or biological attributes of the bio-samples from which the data were acquired.Electronic supplementary materialThe online version of this article (doi:10.1186/s13040-016-0103-7) contains supplementary material, which is available to authorized users.

Highlights

Technological advances enable the cost-effective acquisition of MultiModal Data Sets (MMDS) composed of measurements for multiple, high-dimensional data types obtained from a common set of bio-samples
Advances in array technology, high-throughput sequencing, and clinical imaging platforms enable the measurement of ten’s of thousands of variables of a specific data type in a fixed set of tissue samples [1,2,3,4]. Such “big” data types include genome-wide measurements of messenger RNA and microRNA expression, DNA methylation, single nucleotide polymorphisms (SNPs), next-generation sequence data, and quantitative features extracted from Positron Emission Tomography (PET) images
We describe in greater detail a workflow for the joint analysis of multiple data types based on the Joint Analysis of Many Matrices by ITeration (JAMMIT) algorithm

Summary

Introduction

Technological advances enable the cost-effective acquisition of MultiModal Data Sets (MMDS) composed of measurements for multiple, high-dimensional data types obtained from a common set of bio-samples. High-throughput sequencing, and clinical imaging platforms enable the measurement of ten’s of thousands of variables of a specific data type in a fixed set of tissue samples [1,2,3,4] Such “big” data types include genome-wide measurements of messenger RNA (mRNA) and microRNA expression, DNA methylation, single nucleotide polymorphisms (SNPs), next-generation sequence data, and quantitative features extracted from Positron Emission Tomography (PET) images. The low SNR is due in large part to the relatively small number of variables (out of many thousands measured) that truly represent a Signal of Interest (SOI) in the data that is associated with an important biological and/or clinical attribute of the samples In this context, we are interested in selecting s > 0 rows of D that best approximate a dominant SOI in the row-space of D that may represent a clinically and/or biologically significant attribute of the samples. We call this subset of variables a signature in D, and if D is big, we assume that the signature is “sparse” in D, i.e., s ≪ p

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BioData Mining	Publication Date: Jul 29, 2016
Citations: 9	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Joint analysis of multiple high-dimensional data types using sparse matrix approximations of rank-1 with applications to ovarian and liver cancer.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BioData Mining

Lead the way for us

Similar Papers

Integrated analysis of multiple high-dimensional data sets by joint rank-1 matrix approximations
Ashkan Zeinalzadeh ... Tom Wenska
-
Ashkan Zeinalzadeh, et. al.Ashkan Zeinalzadeh ... Tom Wenska
01 Dec 2015
01 Dec 2015

Abstract 355: The integrated analysis of multiple, high-dimensional data types by joint matrix approximations of rank-1 with applications to liver cancer and glioblastoma
Gordon S Okimoto
Cancer Research | VOL. 74
Gordon S OkimotoGordon S Okimoto
30 Sep 2014
Cancer Research | VOL. 74

Multimodal volume illumination
Erik Sundén ... Timo Ropinski
Computers & Graphics | VOL. 50
Erik Sundén, et. al.Erik Sundén ... Timo Ropinski
19 May 2015
Computers & Graphics | VOL. 50

The Analysis of Two-Way Functional Data Using Two-Way Regularized Singular Value Decompositions
Jianhua Z Huang ... Andreas Buja
Journal of the American Statistical Association | VOL. 104
Jianhua Z Huang, et. al.Jianhua Z Huang ... Andreas Buja
01 Dec 2009
Journal of the American Statistical Association | VOL. 104

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Joint analysis of multiple high-dimensional data types using sparse matrix approximations of rank-1 with applications to ovarian and liver cancer.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BioData Mining