Exploiting Domain Knowledge to Address Class Imbalance in Meteorological Data Mining

Evangelos Tsagalidis,Georgios Evangelidis

doi:10.3390/app122312402

Abstract

We deal with the problem of class imbalance in data mining and machine learning classification algorithms. This is the case where some of the class labels are represented by a small number of examples in the training dataset compared to the rest of the class labels. Usually, those minority class labels are the most important ones, implying that classifiers should primarily perform well on predicting those labels. This is a well-studied problem and various strategies that use sampling methods are used to balance the representation of the labels in the training dataset and improve classifier performance. We explore whether expert knowledge in the field of Meteorology can enhance the quality of the training dataset when treated by pre-processing sampling strategies. We propose four new sampling strategies based on our expertise on the data domain and we compare their effectiveness against the established sampling strategies used in the literature. It turns out that our sampling strategies, which take advantage of expert knowledge from the data domain, achieve class balancing that improves the performance of most classifiers.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Dec 4, 2022
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Exploiting Domain Knowledge to Address Class Imbalance in Meteorological Data Mining

Abstract

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Feature selection for high dimensional imbalanced class data based on F-measure optimization
Chunkai Zhang ... Ying Zhou
-
Chunkai Zhang, et. al.Chunkai Zhang ... Ying Zhou
01 Dec 2017
01 Dec 2017

K Means Cluster Based Undersampling Ensemble for Imbalanced Data Classification
S Santha Subbulaxmi ... G Arumugam
International Journal of Engineering and Advanced Technology | VOL. 9
S Santha Subbulaxmi, et. al.S Santha Subbulaxmi ... G Arumugam
28 Feb 2020
International Journal of Engineering and Advanced Technology | VOL. 9

Learning to improve medical decision making from imbalanced data without a priori cost.
Xiang Wan ... Jiming Liu
BMC Medical Informatics and Decision Making | VOL. 14
Xiang Wan, et. al.Xiang Wan ... Jiming Liu
01 Dec 2014
BMC Medical Informatics and Decision Making | VOL. 14

A Review on Handling Imbalanced Data
Vimalraj S Spelmen ... R Porkodi
-
Vimalraj S Spelmen, et. al.Vimalraj S Spelmen ... R Porkodi
01 Mar 2018
01 Mar 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploiting Domain Knowledge to Address Class Imbalance in Meteorological Data Mining

Abstract

Talk to us

Similar Papers

More From: Applied Sciences