Abstract

There is a growing interest in studying the dependencies between multiple data sources. A common way to analyze the relationships between a pair of data sources based on their correlation is canonical correlation analysis (CCA) which seeks for linear combinations of all variables from each dataset which maximize the correlation between them. However, in high dimensional datasets, such as genomic data, where the number of variables exceeds the number of experimental units, CCA may not lead to meaningful information. Moreover, when collinearity exists in one or both the datasets, CCA may not be applicable. In this paper, we present a novel method to extract common features from a pair of data sources using local principal components and Kendalls ranking. The results show that the proposed method outperforms CCA in many scenarios and is more robust to noisy data. Moreover, meaningful results are obtained using the proposed method when the number of variables exceeds the number of observed units.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.