Scalable Machine Learning on Compact Data Representations

Yasuo Tabei

doi:10.23919/isita.2018.8664274

Abstract

With massive high-dimensional data now common-place in research and industry, there is a strong and growing demand for more scalable computational techniques for data analysis and knowledge discovery. In this paper, we review scalable algorithms for learning statistical models on high-dimensional data. Especially, we introduce two techniques of lossless and lossy compressions. The first one is a method using grammar compression. Grammar compression is a lossless compression for texts and has been successfully applied to binary data matrices for scalable learning of statistical models. The second one is a method of lossy compressions named feature maps (FMs). Recently, quite a few number of FMs for kernel approximations have been proposed and have been used in practical applications. Those methods, of which we present a brief survey in this paper, open the door for large-scale analyses of massive and high-dimensional data.

Full Text