Abstract

The Mahalanobis–Taguchi system (MTS) is a multivariate data diagnosis and prediction technology, which is widely used to optimize large sample data or unbalanced data, but it is rarely used for high-dimensional small sample data. In this paper, the optimized MTS for the classification of high-dimensional small sample data is discussed from two aspects, namely, the inverse matrix instability of the covariance matrix and the instability of feature selection. Firstly, based on regularization and smoothing techniques, this paper proposes a modified Mahalanobis metric to calculate the Mahalanobis distance, which is aimed at reducing the influence of the inverse matrix instability under small sample conditions. Secondly, the minimum redundancy-maximum relevance (mRMR) algorithm is introduced into the MTS for the instability problem of feature selection. By using the mRMR algorithm and signal-to-noise ratio (SNR), a two-stage feature selection method is proposed: the mRMR algorithm is first used to remove noise and redundant variables; the orthogonal table and SNR are then used to screen the combination of variables that make great contribution to classification. Then, the feasibility and simplicity of the optimized MTS are shown in five datasets from the UCI database. The Mahalanobis distance based on regularization and smoothing techniques (RS-MD) is more robust than the traditional Mahalanobis distance. The two-stage feature selection method improves the effectiveness of feature selection for MTS. Finally, the optimized MTS is applied to email classification of the Spambase dataset. The results show that the optimized MTS outperforms the classical MTS and the other 3 machine learning algorithms.

Highlights

  • The optimized Mahalanobis–Taguchi system (MTS) for the classification of high-dimensional small sample data is discussed from two aspects, namely, the inverse matrix instability of the covariance matrix and the instability of feature selection

  • E calculation of Mahalanobis distance based on regularization and smoothing techniques (RS-MD) of each sample is transformed into

  • E two-stage feature selection ensures the robustness of the selected feature combination by using the minimum redundancy-maximum relevance (mRMR) algorithm and improves the classification accuracy by using the orthogonal table and signal-to-noise ratio (SNR). erefore, it achieves the goals of robust optimization and dimension reduction e optimized Mahalanobis–Taguchi system uses the Mahalanobis distance based on regularization and smoothing techniques (RS-MD) as a measurement scale and uses the two-stage feature selection method to screen features. e algorithm flow of the optimized Mahalanobis–Taguchi system is presented in Algorithm 1

Read more

Summary

Negative class

Comparative Analysis between the Two-Stage Feature Selection Method and the Feature Selection of Traditional MTS. On the basis of the results of the two feature selection methods, the classification accuracy in each dataset after feature selection is calculated. A total of 360 emails (190 regular emails and 170 spam emails)

Normal Anomaly
Decision tree
Feature Score Feature Score Feature Score
Classical MTS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call