Abstract

AbstractThe performance of machine learning algorithms is affected by the imbalanced distribution of data among classes. This issue is crucial in various practical problem domains, for example, in medical diagnosis, network intrusion, fraud detection etc. Most efforts so far are mainly focused upon binary class imbalance problem. However, the class imbalance problem is also reported in multi-class scenario. The solutions proposed by the researchers for two-class scenario are not applicable to multi-class domains. So, in this paper, we have developed an effective Weighted Multi-class Least Squares Twin Support Vector Machine (WMLSTSVM) approach to address the problem of imbalanced data classification for multi class. This research work employs appropriate weight setting in loss function, e.g. it adjusts the cost of error for imbalanced data in order to control the sensitivity of the classifier. In order to prove the validity of the proposed approach, the experiment has been performed on fifteen benchmark d...

Highlights

  • Classification is one of the significant techniques of data mining which predicts the class label for any unknown input data

  • Experiment and result Analysis We have evaluated the performance of the proposed WMLSTSVM classifier with other classifiers such as Multi-SVM, AdaBoost.NC, Multiple Birth Twin Support Vector Machine (MBSVM) [81], OVOMLSTSVM and OVA MLSTSVM using 10 fold cross validation

  • It is observed that WMLSTSVM performs significantly better than Adaboost.NC, Random Oversampling (ROS) MBSVM, OVO MLSTSVM, Global CS SVM, Static SMOTE SVM and ROS SVM classifiers for non-linear cases

Read more

Summary

Introduction

Classification is one of the significant techniques of data mining which predicts the class label for any unknown input data. The degree of imbalance differs from one application domain to another and the correct class prediction of data points in an unusual class becomes more significant than the contrary case, for example, in disease diagnostic problem where the cases of diseases are unusual as compared to the normal population. In this case, the correct recognition of a person with disease becomes more important.

Imbalanced Data problem
Algorithmic level solutions
Cost –sensitive solutions
Multi-class Imbalance problems
Global-CS
Other methods
Background
Twin Support Vector Machine
Least Squares Twin Support Vector Machine
Multiclass Least Squares Twin Support Vector Machine
Non-Linear MLSTSVM
Weighted Multi-class Least Squares Twin Support Vector Machine
Non-Linear WMLSTSVM
Dataset Description
Findings
Performance Evaluation Measures
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call