Abstract
Revised IP-OLDF (optimal linear discriminant function by integer programming) is a linear discriminant function to minimize the number of misclassifications (NM) of training samples by integer programming (IP). However, IP requires large computation (CPU) time. In this paper, it is proposed how to reduce CPU time by using linear programming (LP). In the first phase, Revised LP-OLDF is applied to all cases, and all cases are categorized into two groups: those that are classified correctly or those that are not classified by support vectors (SVs). In the second phase, Revised IP-OLDF is applied to the misclassified cases by SVs. This method is called Revised IPLP-OLDF. In this research, it is evaluated whether NM of Revised IPLP-OLDF is good estimate of the minimum number of misclassifications (MNM) by Revised IP-OLDF. Four kinds of the real data—Iris data, Swiss bank note data, student data, and CPD data—are used as training samples. Four kinds of 20,000 re-sampling cases generated from these data are used as the evaluation samples. There are a total of 149 models of all combinations of independent variables by these data. NMs and CPU times of the 149 models are compared with Revised IPLP-OLDF and Revised IP-OLDF. The following results are obtained: 1) Revised IPLP-OLDF significantly improves CPU time. 2) In the case of training samples, all 149 NMs of Revised IPLP-OLDF are equal to the MNM of Revised IP-OLDF. 3) In the case of evaluation samples, most NMs of Revised IPLP-OLDF are equal to NM of Revised IP-OLDF. 4) Generalization abilities of both discriminant functions are concluded to be high, because the difference between the error rates of training and evaluation samples are almost within 2%. Therefore, Revised IPLP-OLDF is recommended for the analysis of big data instead of Revised IP-OLDF. Next, Revised IPLP-OLDF is compared with LDF and logistic regression by 100-fold cross validation using 100 re-sampling samples. Means of error rates of Revised IPLP-OLDF are remarkable fewer than those of LDF and logistic regression.
Highlights
In this paper, four linear discriminant functions by mathematical programming (MP) are introduced
Revised IPLP-optimal linear discriminant function (OLDF) is defined in two phases as follows: In the first phase, Revised LP-OLDF is applied to all cases, and these cases are categorized in two groups: cases that are classified correctly and cases that are not classified by SVs
All NMs obtained by Revised IPLP-OLDF are the same as the minimum number of misclassifications (MNM) of Revised integer programming (IP)-OLDF
Summary
Four linear discriminant functions by mathematical programming (MP) are introduced. We can understand the relation of discriminant functions and NMs. If training data consists of n cases and p-features, n linear equations (Hi(b) = txi ∗ b + 1 = 0) divide p-coefficients space into finite convex polyhedron. Case xi on data space corresponds to linear equation Hi(b) = 0 on discriminant coefficients space, and point bj on coefficients space corresponds to discriminant functions fj(x) = tbj ∗ x + 1. If LDF finds interior point bj in theoretical, this function is free from the unresolved problem This is confirmed by checking that the number of |f (x)| ≤ 10−6 is zero. Revised IP- OLDF resolves problems of discriminant theory This requires more CPU time, because this is solved by IP. Revised IPLP-OLDF is compared with Fisher’s LDF and logistic regression by 100-fold cross validations using 100 re-sampling samples [18, 19]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Statistics, Optimization & Information Computing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.