Abstract

The Mahalanobis Taguchi System (MTS) is considered one of the most promising binary classification algorithms to handle imbalance data. Unfortunately, MTS lacks a method for determining an efficient threshold for the binary classification. In this paper, a nonlinear optimization model is formulated based on minimizing the distance between MTS Receiver Operating Characteristics (ROC) curve and the theoretical optimal point named Modified Mahalanobis Taguchi System (MMTS). To validate the MMTS classification efficacy, it has been benchmarked with Support Vector Machines (SVMs), Naive Bayes (NB), Probabilistic Mahalanobis Taguchi Systems (PTM), Synthetic Minority Oversampling Technique (SMOTE), Adaptive Conformal Transformation (ACT), Kernel Boundary Alignment (KBA), Hidden Naive Bayes (HNB), and other improved Naive Bayes algorithms. MMTS outperforms the benchmarked algorithms especially when the imbalance ratio is greater than 400. A real life case study on manufacturing sector is used to demonstrate the applicability of the proposed model and to compare its performance with Mahalanobis Genetic Algorithm (MGA).

Highlights

  • Classification is one of the supervised learning approaches in which a new observation needs to be assigned to one of the predetermined classes or categories

  • In order to discriminate between the classifiers performances among each other, nonparametric pairwise comparison Wilcoxon test was performed to test the null hypothesis that the two classifiers have equal medians versus the alternating hypothesis that the first classifier’s median is larger than the second one; the results of these comparison are summarized in the ranking score of each classifier for each dataset

  • (vii) The effect of the f-ratio is dominated by the imbalance ratio (IR) effect

Read more

Summary

Introduction

Classification is one of the supervised learning approaches in which a new observation needs to be assigned to one of the predetermined classes or categories. If the number of the predetermined classes is more than two, it is a multiclass classification problem; otherwise, the problem is known as the binary classification problem. At present, these problems have found applications in different domains such as product quality [1] and speech recognition [2]. The border that separates balance from imbalance data is vague; for example, imbalance ratio, which is the ratio between the major to minor class observations, is reported from small values of 100 to 1 to 10000 : 1 [5]

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call