Imbalanced dataset is an important focus for classification. As the mainstream of addressing imbalanced dataset, data-level methods are trapped in facing difficultly-determined subjective parameters, and the inconsistency between new minority samples and original minority samples simultaneously. To address it, this paper develops an extended belief-rule-based (EBRB) system with hybrid sampling strategy, which is a white-box classifier. The hybrid sampling strategy is composed of an under-sampling process and an oversampling process, in which subjective parameters are not involved. The under-sampling is to identify and remove overlapping majority rules by iteratively determining an appropriate objective threshold for calculating the inconsistency degree of rule base, and to determine and remove redundant non-overlapping majority rules by using the density of non-overlapping rules in clustered groups. The oversampling is to design a differential evolution based iterative process to generate new minority rules in groups by minimizing the inconsistency of rule base. The distribution of original dataset is maintained extremely by balancing rules in clustered majority and minority groups, respectively. This EBRB system is used for the auxiliary diagnosis of thyroid nodules, and its superior performance is highlighted by the comparison with existing EBRB systems, representative data-level methods, and algorithm-level methods.
Read full abstract