A new rule-based knowledge extraction approach for imbalanced datasets

Aouatef Mahani,Ahmed Riadh Baba-Ali

doi:10.1007/s10115-019-01330-9

Abstract

Classification consists of extracting a classifier from large datasets. A dataset is imbalanced if it contains more instances in one class compared to the others. An imbalanced dataset contains majority instances and minority ones. It is worth noting that classical learning algorithms have a bias toward majority instances. If classification is applied to imbalanced datasets, it is called partial classification. Its approaches are generally based on sampling methods or algorithmic methods. In this paper, we propose a new hybrid approach using a three-phase-rule-based extraction process. Initially, the first classifier is extracted; it contains classification rules representing only majority instances. Then, we delete the majority instances, which are well classified by these rules, to produce a balanced dataset. The deleted majority instances are replaced by the extracted classification rules, which prevent any information loss. Subsequently, our algorithm is applied to the obtained balanced dataset to produce the second classifier which contains rules that represent both majority and minority instances. Finally, we add the rules of the first classifier to the second classifier to obtain the final classifier, which will be post-processed. Our approach has been tested on several imbalanced binary datasets. The obtained results show its efficiency compared to other results.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A new rule-based knowledge extraction approach for imbalanced datasets

Abstract

Talk to us

Similar Papers

More From: Knowledge and Information Systems

Lead the way for us

Journal: Knowledge and Information Systems	Publication Date: Jan 25, 2019
Citations: 7

Similar Papers

SkewBoost: An algorithm for classifying imbalanced datasets
Saumil Hukerikar ... Vahida Attar
-
Saumil Hukerikar, et. al.Saumil Hukerikar ... Vahida Attar
01 Sep 2011
01 Sep 2011

Soil textural class modeling using digital soil mapping approaches: Effect of resampling strategies on imbalanced dataset predictions
Fereshteh Mirzaei ... Ruth Kerry
Geoderma Regional | VOL. 38
Fereshteh Mirzaei, et. al.Fereshteh Mirzaei ... Ruth Kerry
15 Jun 2024
Geoderma Regional | VOL. 38

A comparative study on noise filtering of imbalanced data sets
Szilvia Szeghalmy ... Attila Fazekas
Knowledge-Based Systems | VOL. 301
Szilvia Szeghalmy, et. al.Szilvia Szeghalmy ... Attila Fazekas
01 Jul 2024
Knowledge-Based Systems | VOL. 301

Predicting Spine Surgery Complications Using Machine Learning
Mohamad Hoda ... Eugene Wai
-
Mohamad Hoda, et. al.Mohamad Hoda ... Eugene Wai
01 Jul 2019
01 Jul 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A new rule-based knowledge extraction approach for imbalanced datasets

Abstract

Talk to us

Similar Papers

More From: Knowledge and Information Systems