Abstract

The deep multiple kernel learning (DMKL) method has caused widespread concern due to its better results compared with shallow multiple kernel learning. However, existing DMKL methods, which have a fixed number of layers and fixed type of kernels, have poor ability to adapt to different data sets and are difficult to find suitable model parameters to improve the test accuracy. In this paper, we propose a self-adaptive deep multiple kernel learning (SA-DMKL) method. Our SA-DMKL method can adapt the model through optimizing the model parameters of each kernel function with a grid search method and change the numbers and types of kernel function in each layer according to the generalization bound that is evaluated with Rademacher chaos complexity. Experiments on the three datasets of University of California—Irvine (UCI) and image dataset Caltech 256 validate the effectiveness of the proposed method on three aspects.

Highlights

  • The success of the Support Vector Machine (SVM) [1] makes the kernel method attract more attention [2,3,4]

  • We propose a model learning algorithm to adapt the model with changing the model parameters of each kernel function with grid search method and the numbers and types of kernel function in each layer according to the a generalization bound that is evaluated with Rademacher chaos complexity

  • We evaluate each base kernel function using the generalization bound based on Rademacher chaos complexity and drop out the base kernels with larger generalization bound

Read more

Summary

Introduction

The success of the Support Vector Machine (SVM) [1] makes the kernel method attract more attention [2,3,4]. The kernel trick makes the linear machine learning problem easy to be generalized to the nonlinear one, which enables the learning method to operate in a high-dimensional, implicit feature space without computing the data in that high-dimensional space. These single kernel methods are based on a single feature space. Cho et al developed a multilayer kernel machine (MKM) that mimicked the computation in large neural nets with a family of arc-cosine kernel functions [15]. These arc-cosine kernels were combined with the-fold composition in multiple layers. The arc-cosine kernel does not admit the hyper-parameters beyond the first layer

Objectives
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.