Abstract
Let Z be a union of a training set X and a testing set Y. Assume that a kernel method produces a dimensionality reduction (DR) mapping P that maps the high-dimensional data X to its row-dimensional representation P(X). The out-of-sample extension of dimensionality reduction problem is to find the dimensionality reduction of Y using the extension of P instead of re-training the whole set Z. In this paper, utilizing the framework of reproducing kernel Hilbert space theory, we introduce a least-square approach to extensions of the popular DR mappings called Diffusion maps (Dmaps). We establish a theoretic analysis for the out-of-sample DR Dmaps, which also provides a uniform treatment of many popular out-of-sample algorithms based on kernel methods. We also illustrate the validity of the developed out-of-sample DR algorithms in several examples.
Highlights
In many scientific and technological areas, we need to analyze and process highdimensional data, such as speech signals, images and videos, text documents, stock trade records, and others
Out-of-Sample Extensions of Diffusion Maps is often unpractical if the cardinality of X becomes very large, or the new data set Z comes as a time-stream
The main purpose of this paper is to give a mathematical analysis on the out-of-sample dimensionality reduction (DR) extension of Diffusion maps (Dmaps)
Summary
In many scientific and technological areas, we need to analyze and process highdimensional data, such as speech signals, images and videos, text documents, stock trade records, and others. To reduce the dimensions of such data sets, people employ non-linear DR methods [6,7,8,9,10,11,12], among which, the method of Diffusion Maps (Dmaps) introduced by Coifman and his research group [13, 14] have been proved attractive and effective. Dmaps employs the diffusion kernel to define the similarity on a given data set X ⊂ RD. Out-of-Sample Extensions of Diffusion Maps is often unpractical if the cardinality of X becomes very large, or the new data set Z comes as a time-stream. The main purpose of this paper is to give a mathematical analysis on the out-of-sample DR extension of Dmaps. We give several examples for the extension
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.