Abstract

The problem of classification for imbalanced datasets is frequently encountered in practical applications. The data to be classified in this problem are skewed, i.e., the samples of one class (the minority class) are much less than those of other classes (the majority class). When dealing with imbalanced datasets, most classifiers encounter a common limitation, that is, they often obtain better classification performances on the majority classes than those on the minority class. To alleviate the limitation, in this study, a fuzzy rule-based modeling approach using information granules is proposed. Information granules, as some entities derived and abstracted from data, can be used to describe and capture the characteristics (distribution and structure) of data from both majority and minority classes. Since the geometric characteristics of information granules depend on the distance measures used in the granulation process, the main idea of this study is to construct information granules on each class of imbalanced data using Minkowski distance measures and then to establish the classification models by using “If-Then” rules. The experimental results involving synthetic and publicly available datasets reflect that the proposed Minkowski distance-based method can produce information granules with a series of geometric shapes and construct granular models with satisfying classification performance for imbalanced datasets.

Highlights

  • As one of the key components of machine learning, fuzzy rule-based classifiers [1,2,3]explore the features of data by constructing fuzzy sets with strong generalization ability and extracting fuzzy rules with good interpretability

  • We proposed a Minkowski distance-based granular classification method

  • Another reason is that the information granules that make up each union information granule are produced based on Minkowski distance with various values of p, which results in the generated information granules having various geometric shapes

Read more

Summary

Introduction

As one of the key components of machine learning, fuzzy rule-based classifiers [1,2,3]. The information granules in different Minkowski spaces are constructed based on a spectrum of Minkowski distance, which can well reveal the geometric structure of both the majority class and minority class of data. At the first stage of our Minkowski distance-based granular classification method, the imbalanced dataset is divided into two partitions in light of their class labels, viz., the majority class and the minority class. The granular Minkowski distance-based classification model for imbalanced datasets is constructed and two “If-” rules emerge to articulate the granular description for each partition and its minority or majority class label.

Information Granules and Minkowski Distance
Minkowski Distance
The Representation of Information Granules
The Distance Measure and Merging Method between Information Granules
The Proposed Fuzzy Granular Classification Methods for Imbalanced Datasets
The Construction of Information Granules for Each Class
Result
The Emergence and Evaluation of the Minkowski Distance-Based Fuzzy Granular
Experiment Studies and Discussion
Synthetic Datasets
Method
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call