Fast adaptive sampling with operation time control

A.S Algasov,S.A Guda,V.I Kolesnikov,V.V Ilicheva,A.V Soldatov

doi:10.1016/j.jocs.2023.101946

Abstract

The rational choice of sampling points is crucial for supervised machine learning algorithms. Adaptive sampling ensures optimal distribution of points in the multidimensional space and provides higher approximation accuracy with respect to the random sampling. Most approaches to the adaptive sampling maximize the function evaluating the compromise between local exploitation and global exploration. However, existing accurate methods suffer from long sampling time, while fast methods are not accurate. Only a few reported methods are available as open source implementations. We introduce a fast adaptive sampling algorithm with accuracy comparable to that of the best-known adaptive sampling methods and demonstrate a comparative study of different approaches to minimization of approximation errors. The method scales linearly with number of sampling points and supports batch generation of new points. The optimization function is the integral norm of an approximation error, the Lp norm, where the norm parameter p regulates sampling, inclining it either towards local exploitation or, conversely, global exploration. The small Lp norm is achieved by reducing the function approximation error and the size of the region with large variation after adding a new sampling point. Our solution is similar to kriging in terms of the choice between local exploitation and global exploration. The difference is in error estimation which depends on the values of the function in sampling points in contrast to homoscedastic variance estimate of kriging. Fast and accurate adaptive sampling is of interest to supervised machine learning tasks such as training a model interatomic potential, quantitative analysis of spectra, self-driving chemical laboratories, and many others. The source code for the new adaptive sampling algorithm MinLpE is available on Github and is distributed under a LGPL license.

Full Text