Weighted support vector machine for extremely imbalanced data

Jongmin Mun,Sungwan Bang,Jaeoh Kim

doi:10.1016/j.csda.2024.108078

Abstract

Based on an asymptotically optimal weighted support vector machine (SVM) that introduces label shift, a systematic procedure is derived for applying oversampling and weighted SVM to extremely imbalanced datasets with a cluster-structured positive class. This method formalizes three intuitions: (i) oversampling should reflect the structure of the positive class; (ii) weights should account for both the imbalance and oversampling ratios; (iii) synthetic samples should carry less weight than the original samples. The proposed method generates synthetic samples from the estimated positive class distribution using a Gaussian mixture model. To prevent overfitting to excessive synthetic samples, different misclassification penalties are assigned to the original positive class, synthetic positive class, and negative class. The proposed method is numerically validated through simulations and an analysis of Republic of Korea Army artillery training data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Weighted support vector machine for extremely imbalanced data

Abstract

Talk to us

Similar Papers

More From: Computational Statistics and Data Analysis

Lead the way for us

Similar Papers

PERCEIVING DIGITAL WATERMARK DETECTION AS IMAGE CLASSIFICATION PROBLEM
P Then ... Y.C Wang
Journal of IT in Asia | VOL. 2
P Then, et. al.P Then ... Y.C Wang
26 Apr 2016
Journal of IT in Asia | VOL. 2

Modeling of Cu-Au prospectivity in the Carajás mineral province (Brazil) through machine learning: Dealing with imbalanced training data
Elias Martins Guerra Prado ... João Gabriel Motta
Ore Geology Reviews | VOL. 124
Elias Martins Guerra Prado, et. al.Elias Martins Guerra Prado ... João Gabriel Motta
05 Jun 2020
Ore Geology Reviews | VOL. 124

P0119ARTIFICIAL INTELLIGENCE IN RENAL PATHOLOGY: IBM WATSON FOR THE IDENTIFICATION OF GLOMERULOSCLEROSIS
Giacomo Donato Cascarano ... Loreto Gesualdo
Nephrology Dialysis Transplantation | VOL. 35
Giacomo Donato Cascarano, et. al.Giacomo Donato Cascarano ... Loreto Gesualdo
01 Jun 2020
P0119ARTIFICIAL INTELLIGENCE IN RENAL PATHOLOGY: IBM WATSON FOR THE IDENTIFICATION OF GLOMERULOSCLEROSIS
Giacomo Donato Cascarano ... Loreto Gesualdo

A Comparison of Weighted Support Vector Machine (WSVM), One-Step WSVM (OWSVM) and Iteratively WSVM (IWSVM) for Mislabeled Data
Syarizul Amri Mohd Dzulkifli ... Mohd Najib Mohd Salleh
-
Syarizul Amri Mohd Dzulkifli, et. al.Syarizul Amri Mohd Dzulkifli ... Mohd Najib Mohd Salleh
05 Dec 2019
05 Dec 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Weighted support vector machine for extremely imbalanced data

Abstract

Talk to us

Similar Papers

More From: Computational Statistics and Data Analysis