Abstract

Traditional online kernel learning analysis assumes independently identically distributed (i.i.d.) about the training sequence. Recent studies reveal that when the loss function is smooth and strongly convex, given T i.i.d. training instances, a constant sampling complexity of random Fourier features is sufficient to ensure O(logT/T) convergence rate of excess risk, which is optimal in online kernel learning up to a logT factor. However, the i.i.d. hypothesis is too strong in practice, which greatly impairs their value. In this paper, we study the sampling complexity of random Fourier features in online kernel learning under non-i.i.d. assumptions. We prove that the sampling complexity under non-i.i.d. settings is also constant, but the convergence rate of excess risk is O(logT/T+ ϕ) , where ϕ is the mixing coefficient measuring the extent of non-i.i.d. of training sequence. We conduct experiments both on artificial and real large-scale data sets to verify our theories.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call