Rare Event Prediction Using Similarity Majority Under-Sampling Technique

Jinyan Li,Nilanjan Dey,Raymond K Wong,Shimin Hu,Simon Fong,Sabah Mohammed,Victor W Chu

doi:10.1007/978-981-10-7242-0_3

Abstract

In data mining it is not uncommon to be confronted by imbalanced classification problem in which interesting samples are rare. Having too many ordinary but too few rare samples as training data, will mislead the classifier to become over-fitted by learning too much from majority class samples and become under-fitted lacking recognizing power for minority class samples. In this research work, a novel rebalancing technique that under-samples (reduce by sampling) the majority class size for subsiding the imbalanced class distributions without synthesizing extra training samples, is studied. This simple method is called Similarity Majority Under-Sampling Technique (SMUTE). By measuring the similarity between each majority class sample and its surrounding minority class samples, SMUTE effectively discriminates the majority and minority class samples with consideration of not changing too much of the underlying non-linear mapping between the input variables and the target classes. Two experiments are conducted and reported in this paper: one is an extensive performance comparison of SMUTE with the states-of-the-arts using generated imbalanced data; the other is the use of real data representing a case of natural disaster prevention where accident samples are rare. SMUTE is found to be working favourably well over other methods in both cases.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Rare Event Prediction Using Similarity Majority Under-Sampling Technique

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Similarity Majority Under-Sampling Technique for Easing Imbalanced Classification Problem
Jinyan Li ... Shimin Hu
-
Jinyan Li, et. al.Jinyan Li ... Shimin Hu
01 Jan 2018
01 Jan 2018

RARE CLASS PROBLEM IN DATA MINING: REVIEW
Snehlata S Dongre
International Journal of Advanced Research in Computer Science | VOL. 8
Snehlata S DongreSnehlata S Dongre
20 Aug 2017
International Journal of Advanced Research in Computer Science | VOL. 8

A new sampling approach for classification of imbalanced data sets with high density
Jia Pengfei ... He Zhenyu
-
Jia Pengfei, et. al. Jia Pengfei ... He Zhenyu
01 Jan 2014
01 Jan 2014

Data Balancing Technique Based on AE-Flow Model for Network Instrusion Detection
Xuanrui Xiong ... Yuan Zhang
-
Xuanrui Xiong, et. al.Xuanrui Xiong ... Yuan Zhang
01 Jan 2023
01 Jan 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Rare Event Prediction Using Similarity Majority Under-Sampling Technique

Abstract

Talk to us

Similar Papers