Granular Classification for Imbalanced Datasets: A Minkowski Distance-Based Method

Chen Fu,Jianhua Yang

doi:10.3390/a14020054

Abstract

The problem of classification for imbalanced datasets is frequently encountered in practical applications. The data to be classified in this problem are skewed, i.e., the samples of one class (the minority class) are much less than those of other classes (the majority class). When dealing with imbalanced datasets, most classifiers encounter a common limitation, that is, they often obtain better classification performances on the majority classes than those on the minority class. To alleviate the limitation, in this study, a fuzzy rule-based modeling approach using information granules is proposed. Information granules, as some entities derived and abstracted from data, can be used to describe and capture the characteristics (distribution and structure) of data from both majority and minority classes. Since the geometric characteristics of information granules depend on the distance measures used in the granulation process, the main idea of this study is to construct information granules on each class of imbalanced data using Minkowski distance measures and then to establish the classification models by using “If-Then” rules. The experimental results involving synthetic and publicly available datasets reflect that the proposed Minkowski distance-based method can produce information granules with a series of geometric shapes and construct granular models with satisfying classification performance for imbalanced datasets.

Highlights

As one of the key components of machine learning, fuzzy rule-based classifiers [1,2,3]explore the features of data by constructing fuzzy sets with strong generalization ability and extracting fuzzy rules with good interpretability
We proposed a Minkowski distance-based granular classification method
Another reason is that the information granules that make up each union information granule are produced based on Minkowski distance with various values of p, which results in the generated information granules having various geometric shapes

Summary

Introduction

As one of the key components of machine learning, fuzzy rule-based classifiers [1,2,3]. The information granules in different Minkowski spaces are constructed based on a spectrum of Minkowski distance, which can well reveal the geometric structure of both the majority class and minority class of data. At the first stage of our Minkowski distance-based granular classification method, the imbalanced dataset is divided into two partitions in light of their class labels, viz., the majority class and the minority class. The granular Minkowski distance-based classification model for imbalanced datasets is constructed and two “If-” rules emerge to articulate the granular description for each partition and its minority or majority class label.

Information Granules and Minkowski Distance

Minkowski Distance

The Representation of Information Granules

The Distance Measure and Merging Method between Information Granules

The Proposed Fuzzy Granular Classification Methods for Imbalanced Datasets

The Construction of Information Granules for Each Class

Result

The Emergence and Evaluation of the Minkowski Distance-Based Fuzzy Granular

Experiment Studies and Discussion

Synthetic Datasets

Method

Findings

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Algorithms	Publication Date: Feb 7, 2021
Citations: 7	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Granular Classification for Imbalanced Datasets: A Minkowski Distance-Based Method

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms

Lead the way for us

Similar Papers

Learning from imbalanced data sets with boosting and data generation
Hongyu Guo ... Herna L Viktor
ACM SIGKDD Explorations Newsletter | VOL. 6
Hongyu Guo, et. al.Hongyu Guo ... Herna L Viktor
01 Jun 2004
ACM SIGKDD Explorations Newsletter | VOL. 6

Classifying imbalanced data sets using similarity based hierarchical decomposition
Cigdem Beyan ... Robert Fisher
Pattern Recognition | VOL. 48
Cigdem Beyan, et. al.Cigdem Beyan ... Robert Fisher
26 Nov 2014
Pattern Recognition | VOL. 48

Online Extreme Learning Machine with Hybrid Sampling Strategy for Sequential Imbalanced Data
Wentao Mao ... Jinwan Wang
Cognitive Computation | VOL. 9
Wentao Mao, et. al.Wentao Mao ... Jinwan Wang
17 Aug 2017
Cognitive Computation | VOL. 9

CDBH: A clustering and density-based hybrid approach for imbalanced data classification
Behzad Mirzaei ... Hossein Nezamabadi-Pour
Expert systems with applications | VOL. 164
Behzad Mirzaei, et. al.Behzad Mirzaei ... Hossein Nezamabadi-Pour
28 Sep 2020
Expert systems with applications | VOL. 164

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Granular Classification for Imbalanced Datasets: A Minkowski Distance-Based Method

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms