Abstract
Abstract INTRODUCTION: The aim of this study is to validate a novel algorithm called the Joint Analysis of Many Matrices (JAMM) that detects correlations between data matrices associated with multiple data types acquired from a common set of samples. METHODS: Let A = {A1, A2, …,AN} be a collection of data matrices associated with N different data types acquired from a common set of n samples where each Ai has dimensions pi×n. For each Ai, the JAMM algorithm computes a rank-1 approximation Ai ∼ Ui*X where Ui (pi × 1) is the top eigenarray for Ai and X (1 × n) is the top eigengene common to all the Ai. Since X = ∑transpose(Ui)*X, the rows of each Ai that are most correlated with X (and hence to each other) can be identified by applying the universal threshold Ti = σi* √ [2*log(pi)] to Ui where σi is the standard deviation of Ui. The above procedure is iterated on “residualized” versions of the original data to detect additional signals of cross-correlation until the noise floor is reached. JAMM detection performance was assessed using area under the ROC curve (AUC) on N = 3 simulated data matrices each containing 3 distinct signals that correlated pre-selected rows of each matrix embedded in a Gaussian noise background. Resampling methods and Ingenuity Pathway Analysis were used to assess the significance of signals detected by JAMM in real data sets that juxtaposed gene expression and DNA methylation (liver cancer), mRNA and microRNA expression (glioblastoma), and gene expression and PET imaging features (liver cancer). RESULTS: On simulated data, JAMM consistently outperformed three other algorithms (Generalized Singular Value Decomposition, Canonical Correlation Analysis, and the recently developed JIVE algorithm) based on mean AUC for all 3 signals in each data matrix over 100 simulations. On real data, JAMM analysis of mRNA and microRNA data from 234 glioblastoma samples from The Cancer Genome Atlas recapitulated survival-related tumor subtypes previously discovered based on gene expression alone.Moreover, JAMM analysis of genome-wide expression and DNA methylation data resulted in a compact, methylation-driven, gene expression signature that segregated tumor and normal samples and showed significant enrichment for biological processes related to liver cancer. Finally, JAMM analysis of mRNA and PET imaging data identified a novel liver cancer subtype characterized by dysregulation of pathways involved in bile acid metabolism. CONCLUSIONS: We have demonstrated on simulated and real data sets that the JAMM algorithm for the joint analysis of multiple data types compares favorably with other computational methods in terms of speed, accuracy and biological enrichment. For real data sets, the JAMM algorithm appears to constrain the size of signatures found for each data type with minimal loss of information when compared to signatures obtained from a single data type alone. Citation Format: Gordon S. Okimoto. The integrated analysis of multiple, high-dimensional data types by joint matrix approximations of rank-1 with applications to liver cancer and glioblastoma. [abstract]. In: Proceedings of the 105th Annual Meeting of the American Association for Cancer Research; 2014 Apr 5-9; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2014;74(19 Suppl):Abstract nr 355. doi:10.1158/1538-7445.AM2014-355
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.