Abstract

AbstractMany modern data mining applications are concerned with the analysis of datasets in which the observations are described by paired high‐dimensional vectorial representations or ‘views’. Some typical examples can be found in web mining and genomics applications. In this article we present an algorithm for data clustering with multiple views, multi‐view predictive partitioning (MVPP), which relies on a novel criterion of predictive similarity between data points. We assume that, within each cluster, the dependence between multivariate views can be modeled by using a two‐block partial least squares (TB‐PLS) regression model, which performs dimensionality reduction and is particularly suitable for high‐dimensional settings. The proposed MVPP algorithm partitions the data such that the within‐cluster predictive ability between views is maximized. The proposed objective function depends on a measure of predictive influence of points under the TB‐PLS model which has been derived as an extension of the predicted residual sums of squares (PRESS) statistic commonly used in ordinary least squares regression. Using simulated data, we compare the performance of MVPP to that of competing multi‐view clustering methods which rely upon geometric structures of points, but ignore the predictive relationship between the two views. State‐of‐art results are obtained on benchmark web mining datasets. © 2012 Wiley Periodicals, Inc. Statistical Analysis and Data Mining, 2012

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.