Study on Unbalanced Binary Classification with Unknown Misclassification Costs

J Gao,J Y Wang,Z C Mo,L Gong

doi:10.1109/ieem.2018.8607671

Abstract

With the rapid development of big data and machine learning technologies, many fields have begun to use related algorithms and methods. Classification algorithms have been widely used in the fields of financial risk identification, fault diagnosis, medical diagnosis, etc. However, the datasets are often unbalanced in these cases and the original methods fail to classify instances correctly. Many methods such as over-sampling, under-sampling and ensemble methods were raised to improve the classifier's performance, but which one to choose for a certain dataset still remains a problem. Therefore, this paper aims at a experimental conclusion on which kind of method can perform best on unbalanced classification problems generally. In detail, we evaluated the performances of 13 kinds of methods for unbalanced classification on several unbalanced datasets which have different amounts of instances and different ratios of positive instances, and finally came to a conclusion.

Full Text