Abstract

In this paper, we focus on the new model selection procedure of the discriminant analysis. Combining re-sampling technique with k-fold cross validation, we develop a k-fold cross validation for small sample method. By this breakthrough, we obtain the mean error rate in the validation samples (M2) and the 95\% confidence interval (CI) of discriminant coefficient. Moreover, we propose the model selection procedure in which the model having a minimum M2 was chosen to the best model. We apply this new method and procedure to the pass/ fail determination of exam scores. In this case, we fix the constant =1 for seven linear discriminant functions (LDFs) and several good results were obtained as follows: 1) M2 of Fisher's LDF are over 4.6\% worse than Revised IP-OLDF. 2) A soft-margin SVM for penalty c=1 (SVM1) is worse than another mathematical programming (MP) based LDFs and logistic regression . 3) The 95\% CI of the best discriminant coefficients was obtained. Seven LDFs except for Fisher's LDF are almost the same as a trivial LDF for the linear separable model. Furthermore, if we choose the median of the coefficient of seven LDFs except for Fisher's LDF, those are almost the same as the trivial LDF for the linear separable model.

Highlights

  • In this paper, we propose new model selection procedure of the discriminant analysis by the “k-fold cross-validation for small sample” method [14, 18]

  • We focus on two means of error rates such as “M1 and M2” in the training and validation samples and propose the model with minimum M2s is the best model

  • We propose the new method and model selection procedure as follows: 1)We discriminate an original data by eight LDFs and two discriminant functions such as quadratic discriminant function (QDF) and a Regularized Discriminant Analysis (RDA) [2]

Read more

Summary

Introduction

We propose new model selection procedure of the discriminant analysis by the “k-fold cross-validation for small sample” method [14, 18]. Lachenbruch et al [4] had proposed a leave-one-out (LOO) method for model selection of the discriminant analysis, they could not achieve the new method because of lack of computer power. If we fix “k=100”, we can obtain 100 LDFs and 100 error rates in the training and validation samples. We consider the model with minimum M2 among all possible combination models [3] is the best model We apply this new method and procedure for three data sets of the pass/fail determinations and obtain good results. The new method and procedure give us precise and deterministic judgment about the model selection of the discriminant analysis. By fixing the constant=1, most LDFs except for Fisher’s LDF are almost the same as trivial LDFs

Method
Eight LDFs
Original Data and Re-sampling Sample
NMs of Original Data
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call