Abstract

Computational tools in modern data analysis must be scalable to satisfy business and research time constraints. In this regard, two alternatives are possible: (i) adapt available algorithms or design new approaches such that they can run on a distributed computing environment (ii) develop model-based learning techniques that can be trained efficiently on a small subset of the data and make reliable predictions. In this chapter two recent algorithms following these different directions are reviewed. In particular, in the first part a scalable in-memory spectral clustering algorithm is described. This technique relies on a kernel -based formulation of the spectral clustering problem also known as kernel spectral clustering . More precisely, a finite dimensional approximation of the feature map via the Nystrom method is used to solve the primal optimization problem, which decreases the computational time from cubic to linear. In the second part, a distributed clustering approach with fixed computational budget is illustrated. This method extends the k-means algorithm by applying regularization at the level of prototype vectors. An optimal stochastic gradient descent scheme for learning with \(l_1\) and \(l_2\) norms is utilized, which makes the approach less sensitive to the influence of outliers while computing the prototype vectors.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.