We study a class of spectral learning methods with dependent observations, including popular ridge regression, Landweber iteration, spectral cut-off and so on. We derive an explicit risk bound in terms of the correlation of the observations, regularity of the regression function, and effective dimension of the reproducing kernel Hilbert space. By appropriately choosing regularization parameter according to the sample size, the risk bound yields a nearly optimal learning rate with a logarithmic term for strongly mixing sequences. We thus extend the applicable range of spectral algorithm to non-i.i.d. sampling process. Particularly, it is shown that the learning rates for i.i.d. samples in the literature refer to our special case, i.e., the mixing condition parameter tends to zero.
Read full abstract