Abstract
We propose a quantum algorithm for training nonlinear support vector machines (SVM) for feature space learning where classical input data is encoded in the amplitudes of quantum states. Based on the classical SVM-perf algorithm of Joachims \cite{joachims2006training}, our algorithm has a running time which scales linearly in the number of training examplesm(up to polylogarithmic factors) and applies to the standard soft-marginℓ1-SVM model. In contrast, while classical SVM-perf has demonstrated impressive performance on both linear and nonlinear SVMs, its efficiency is guaranteed only in certain cases: it achieves linearmscaling only for linear SVMs, where classification is performed in the original input data space, or for the special cases of low-rank or shift-invariant kernels. Similarly, previously proposed quantum algorithms either have super-linear scaling inm, or else apply to different SVM models such as the hard-margin or least squaresℓ2-SVM which lack certain desirable properties of the soft-marginℓ1-SVM model. We classically simulate our algorithm and give evidence that it can perform well in practice, and not only for asymptotically large data sets.
Highlights
Support vector machines (SVMs) are powerful supervised learning models which perform classification by identifying a decision surface which separates data according to their labels [2, 3]
We have proposed a quantum extension of support vector machines (SVM)-perf for training nonlinear soft-margin 1-SVMs in time linear in the number of training examples m, up to polylogarithmic factors, and given numerical evidence that the algorithm can perform well in practice as well as in theory
This goes beyond classical SVMperf, which achieves linear m scaling only for linear
Summary
Support vector machines (SVMs) are powerful supervised learning models which perform classification by identifying a decision surface which separates data according to their labels [2, 3]. When K admits a low-rank approximation though, sampling-based approaches such as the Nystrom method [10] or incomplete Cholesky factorization [11] can be used to obtain O(m) running times, it may not be clear a priori whether such a low-rank approximation is possible Another special case corresponds to so-called shift-invariant kernels [12], which include the popular Gaussian radial basis function (RBF) kernel, where classical sampling techniques can be used to map the high dimensional data into a random low dimensional feature space, which can be trained by fast linear methods. This means that the procedure cannot be de-quantized in the same way
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.