RCUR: an R package for CUR matrix decomposition

András Bodor,István Csabai,Michael W Mahoney,Norbert Solymosi

doi:10.1186/1471-2105-13-103

Abstract

BackgroundMany methods for dimensionality reduction of large data sets such as those generated in microarray studies boil down to the Singular Value Decomposition (SVD). Although singular vectors associated with the largest singular values have strong optimality properties and can often be quite useful as a tool to summarize the data, they are linear combinations of up to all of the data points, and thus it is typically quite hard to interpret those vectors in terms of the application domain from which the data are drawn. Recently, an alternative dimensionality reduction paradigm, CUR matrix decompositions, has been proposed to address this problem and has been applied to genetic and internet data. CUR decompositions are low-rank matrix decompositions that are explicitly expressed in terms of a small number of actual columns and/or actual rows of the data matrix. Since they are constructed from actual data elements, CUR decompositions are interpretable by practitioners of the field from which the data are drawn.ResultsWe present an implementation to perform CUR matrix decompositions, in the form of a freely available, open source R-package called rCUR. This package will help users to perform CUR-based analysis on large-scale data, such as those obtained from different high-throughput technologies, in an interactive and exploratory manner. We show two examples that illustrate how CUR-based techniques make it possible to reduce significantly the number of probes, while at the same time maintaining major trends in data and keeping the same classification accuracy.ConclusionsThe package rCUR provides functions for the users to perform CUR-based matrix decompositions in the R environment. In gene expression studies, it gives an additional way of analysis of differential expression and discriminant gene selection based on the use of statistical leverage scores. These scores, which have been used historically in diagnostic regression analysis to identify outliers, can be used by rCUR to identify the most informative data points with respect to which to express the remaining data points.

Highlights

Many methods for dimensionality reduction of large data sets such as those generated in microarray studies boil down to the Singular Value Decomposition (SVD)
The singular vectors, or principal components, associated with the largest singular values have strong optimality properties, and they can often be quite useful as a tool to summarize and identify major patterns of the data. (See, e.g., [1], as a nice example in the field of genomics and [2] for a fast matrix factorization algorithm.) it is typically quite hard for a geneticist or downstream data analyst to interpret those vectors in terms of the application domain from which the data are drawn
We illustrate the benefits of CUR matrix decompositions and dimension reduction with the rCUR package by comparing it with two different previously-published case studies

Summary

Introduction

Many methods for dimensionality reduction of large data sets such as those generated in microarray studies boil down to the Singular Value Decomposition (SVD). (See, e.g., [1], as a nice example in the field of genomics and [2] for a fast matrix factorization algorithm.) it is typically quite hard for a geneticist or downstream data analyst to interpret those vectors in terms of the application domain from which the data are drawn. The reason for this is that the singular vectors are mathematical abstractions defined for any matrix, and they are typically linear combinations of all of the input data. It would be interesting to try to find basis vectors for all experiment vectors, using actual experiment vectors and not artificial bases that offer little insight.”

Objectives

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: May 17, 2012
Citations: 30	License type: cc-by

R Discovery Prime

R Discovery Prime

RCUR: an R package for CUR matrix decomposition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

CUR matrix decompositions for improved data analysis
Michael W Mahoney ... Petros Drineas
Proceedings of the National Academy of Sciences | VOL. 106
Michael W Mahoney, et. al.Michael W Mahoney ... Petros Drineas
20 Jan 2009
Proceedings of the National Academy of Sciences | VOL. 106

CUR Matrix Decompositions Method for Joint Analysis of Multiple Phenotypes
Fadhila Yosof
-
Fadhila YosofFadhila Yosof
01 Jan 2018
01 Jan 2018

Randomized Matrix Decompositions Using R
N Benjamin Erichson ... Steven L Brunton
Journal of Statistical Software | VOL. 89
N Benjamin Erichson, et. al.N Benjamin Erichson ... Steven L Brunton
01 Jan 2019
Journal of Statistical Software | VOL. 89

Modal Analysis of Fluid Flows: An Overview
Kunihiko Taira ... Lawrence S Ukeiley
AIAA Journal | VOL. 55
Kunihiko Taira, et. al.Kunihiko Taira ... Lawrence S Ukeiley
31 Oct 2017
AIAA Journal | VOL. 55

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

RCUR: an R package for CUR matrix decomposition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics