Abstract

BackgroundAdvance in high-throughput technologies in genomics, transcriptomics, and metabolomics has created demand for bioinformatics tools to integrate high-dimensional data from different sources. Canonical correlation analysis (CCA) is a statistical tool for finding linear associations between different types of information. Previous extensions of CCA used to capture nonlinear associations, such as kernel CCA, did not allow feature selection or capturing of multiple canonical components. Here we propose a novel method, two-stage kernel CCA (TSKCCA) to select appropriate kernels in the framework of multiple kernel learning.ResultsTSKCCA first selects relevant kernels based on the HSIC criterion in the multiple kernel learning framework. Weights are then derived by non-negative matrix decomposition with L1 regularization. Using artificial datasets and nutrigenomic datasets, we show that TSKCCA can extract multiple, nonlinear associations among high-dimensional data and multiplicative interactions among variables.ConclusionsTSKCCA can identify nonlinear associations among high-dimensional data more reliably than previous nonlinear CCA methods.

Highlights

  • Advance in high-throughput technologies in genomics, transcriptomics, and metabolomics has created demand for bioinformatics tools to integrate high-dimensional data from different sources

  • Canonical correlation analysis (CCA), kernel CCA, and multiple kernel learning we briefly review the bases of our proposed method, namely, linear canonical correlation analysis (CCA), kernel CCA (KCCA), and multiple kernel learning (MKL)

  • Results we experimentally evaluate the performance of our proposed two-stage kernel CCA (TSKCCA), sparse additive functional CCA (SAFCCA) [12], and other methods using synthetic data and nutrigenomic experimental data

Read more

Summary

Introduction

Advance in high-throughput technologies in genomics, transcriptomics, and metabolomics has created demand for bioinformatics tools to integrate high-dimensional data from different sources. Canonical correlation analysis (CCA) is a statistical tool for finding linear associations between different types of information. Previous extensions of CCA used to capture nonlinear associations, such as kernel CCA, did not allow feature selection or capturing of multiple canonical components. Canonical correlation analysis (CCA) [1] is a statistical method for finding common information from two different sources of multivariate data. This method optimizes linear projection vectors so that two random multivariate datasets are maximally correlated. Kernel CCA (KCCA) was introduced to capture nonlinear associations between two blocks of multivariate. We propose two-stage kernel CCA (TSKCCA), which enables us (1) to select sparse features. We apply standard KCCA using target kernels obtained in the first stage to find multiple nonlinear correlations

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call