Abstract

Traditional online kernel learning analysis assumes independently identically distributed (i.i.d.) about the training sequence. Recent studies reveal that when the loss function is smooth and strongly convex, given T i.i.d. training instances, a constant sampling complexity of random Fourier features is sufficient to ensure O(logT/T) convergence rate of excess risk, which is optimal in online kernel learning up to a logT factor. However, the i.i.d. hypothesis is too strong in practice, which greatly impairs their value. In this paper, we study the sampling complexity of random Fourier features in online kernel learning under non-i.i.d. assumptions. We prove that the sampling complexity under non-i.i.d. settings is also constant, but the convergence rate of excess risk is O(logT/T+ ϕ) , where ϕ is the mixing coefficient measuring the extent of non-i.i.d. of training sequence. We conduct experiments both on artificial and real large-scale data sets to verify our theories.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.