Abstract

Extensions of kernel methods for the class imbalance problems have been extensively studied. Although they work well in coping with nonlinear problems, the high computation and memory costs severely limit their application to real-world imbalanced tasks. The Nyström method is an effective technique to scale kernel methods. However, the standard Nyström method needs to sample a sufficiently large number of landmark points to ensure an accurate approximation, which seriously affects its efficiency. In this study, we propose a multi-Nyström method based on mixtures of Nyström approximations to avoid the explosion of subkernel matrix, whereas the optimization to mixture weights is embedded into the model training process by multiple kernel learning (MKL) algorithms to yield more accurate low-rank approximation. Moreover, we select subsets of landmark points according to the imbalance distribution to reduce the model's sensitivity to skewness. We also provide a kernel stability analysis of our method and show that the model solution error is bounded by weighted approximate errors, which can help us improve the learning process. Extensive experiments on several large scale datasets show that our method can achieve a higher classification accuracy and a dramatical speedup of MKL algorithms.

Highlights

  • Real-world problems in computer vision [1], natural language processing [2, 3], and data mining [4, 5] present imbalanced traits in their data, which may be developed by the inherent properties of the data or some external factors such as sampling bias or measurement error

  • In [17], a kernel boundary alignment algorithm is proposed to adjust the class boundary by modifying the kernel matrix according to the imbalanced data distribution. e kernel-based adaptive synthetic data generation (KernelADASYN) for imbalanced learning is proposed in [18], which uses kernel density estimation (KDE) to estimate the adaptive oversampling density

  • Without computing and storing the full kernel matrix, our method can scale to large scale scenarios. e main contributions of this study are summarized as follows: (1) We propose a multi-Nystrom method to overcome the computational constraints of the Nystrom method

Read more

Summary

Introduction

Real-world problems in computer vision [1], natural language processing [2, 3], and data mining [4, 5] present imbalanced traits in their data, which may be developed by the inherent properties of the data or some external factors such as sampling bias or measurement error. The minority class in these real-world problems is usually more important and expensive than the majority class. There usually exist complex nonlinear structures in these realworld imbalanced data. In this case, the extensions of kernel methods for the class imbalance problems have been proven very effective [13,14,15]. In [17], a kernel boundary alignment algorithm is proposed to adjust the class boundary by modifying the kernel matrix according to the imbalanced data distribution. E existing kernel-based class imbalanced learning (kernel CIL) methods suffer from serious challenges that the cost of calculating and storing a vast kernel matrix is very expensive With the development of data storage and data acquisition equipment, the scale of data continues to grow. e existing kernel-based class imbalanced learning (kernel CIL) methods suffer from serious challenges that the cost of calculating and storing a vast kernel matrix is very expensive

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call