Abstract

An over-sampling technique called V-synth is proposed and compared to borderline SMOTE (bSMOTE), a common methodology used to balance an imbalanced dataset for classification purposes. V-synth is a machine learning methodology that allows synthetic minority points to be generated based on the properties of a Voronoi diagram. A Voronoi diagram is a collection of geometric regions that encapsulate classifying points in such a way that any point within the region is closest to the encapsulated classifier than any other adjacent classifiers based on their distance from one another. Because of properties inherent to Voronoi diagrams, V-synth identifies exclusive regions of feature space where it is ideal to create synthetic minority samples. To test the generalization and application of V-synth, six databases from various problem domains were selected from the University of California Irvine's Machine Learning Repository. Though not always guaranteed due to the random nature of synthetic over-sampling, significant evidence is presented that supports the hypothesis that V-synth more consistently leads to the creation of more accurate and better-balanced classification models than bSMOTE when the classification complexity of a dataset is high.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call