Abstract

In many real-world domains, datasets with imbalanced class distributions occur frequently, which may confuse various machine learning tasks. Among all these tasks, learning classifiers from imbalanced datasets is an important topic. To perform this task well, it is crucial to train a distance metric which can accurately measure similarities between samples from imbalanced datasets. Unfortunately, existing distance metric methods, such as large margin nearest neighbor, information-theoretic metric learning, etc., care more about distances between samples and fail to take imbalanced class distributions into consideration. Traditional distance metrics have natural tendencies to favor the majority classes, which can more easily satisfy their objective function. Those important minority classes are always neglected during the construction process of distance metrics, which severely affects the decision system of most classifiers. Therefore, how to learn an appropriate distance metric which can deal with imbalanced datasets is of vital importance, but challenging. In order to solve this problem, this paper proposes a novel distance metric learning method named distance metric by balancing KL-divergence (DMBK). DMBK defines normalized divergences using KL-divergence to describe distinctions between different classes. Then it combines geometric mean with normalized divergences and separates samples from different classes simultaneously. This procedure separates all classes in a balanced way and avoids inaccurate similarities incurred by imbalanced class distributions. Various experiments on imbalanced datasets have verified the excellent performance of our novel method.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call