Abstract
A new theory of discriminant analysis “the Theory” after R. Fisher is explained. There are five serious problems with discriminant analysis. I completely solve these problems through five mathematical programming-based linear discriminant functions (MP-based LDFs). First, I develop an optimal linear discriminant function using integer programming (IP-OLDF) based on a minimum number of misclassifications (minimum NM (MNM)) criterion. We consider discriminating the data with n-cases by p-variables. The case x i = (x 1i + … + x pi ) is p-vector (i = 1, …, n). Because I formulate IP-OLDF in the p-dimensional discriminant coefficient space b, n-linear hyperplanes (x i × b + 1 = 0) divide the coefficient space into finite convex polyhedrons (CPs). LDFs that correspond to a CP interior point misclassify the same k-cases, and this clearly reveals the relationship between NM and the discriminant coefficient. Because there are finite CPs in the discriminant coefficient space, we should select the CP interior point with MNM. We call this CP, “optimal CP (OCP).” MNM decreases monotonously (MNM p ≥ NMN(p+1)). Therefore, if MNM p = 0, all MNMs of the models, including these p-variables, are zero. If data are general positions, IP-OLDF searches for the vertex of true OCP. However, if data are not general positions, such as Student data, IP-OLDF might not search for the vertex of true OCP. Therefore, I develop Revised IP-OLDF that searches for the interior point of true OCP directly. If LDF corresponds to the CP vertex or edge, there are over p-cases on the discriminant hyperplane and LDF cannot discriminate these cases correctly (Problem 1). This fact means that NM might not be true. Only Revised IP-OLDF is free from Problem 1. When IP-OLDF discriminates Swiss banknote data that have six variables, MNM of the two-variable model (X4, X6) is zero. Therefore, 16 models, including (X4, X6), are zero, and 47 models are not linearly separable. Although a hard-margin SVM (H-SVM) indicates linearly separable data (LSD) clearly, there are few types of research on LSD discrimination. Most statisticians erroneously believe that the purpose of discrimination is to discriminate overlapping data, not LSD. All LDFs, with the exception of H-SVM and Revised IP-OLDF, might not discriminate LSD correctly (Problem 2). Moreover, such LDFs cannot determine whether the data overlap or LSD because MNM = 0 means LSD and MNM > 0 means overlap. I demonstrate that Fisher’s LDF and a quadratic discriminant function (QDF) cannot judge the pass/fail determination using examination scores and that the 18 error rates of both discriminant functions are very high. I explain the defect of the generalized inverse matrix technique and that QDF misclassifies all cases of class 1 to class 2 for a particular case (Problem 3) using Japanese-automobile data. Fisher never formulated an equation for standard errors (SEs) of the error rate and discriminant coefficient (Problem 4). The k-fold cross-validation for small sample method (Method 1) solves Problem 4. This offers the error rate means, M1 and M2, from the training and validation samples in addition to the 95 % confidence interval (CI) of the error rate and coefficient. I propose a simple and powerful model selection procedure to select the best model with minimum M2 instead of the leave-one-out (LOO) procedure. The best models of Revised IP-OLDF are better than seven other LDFs. For more than ten years, many researchers have struggled to analyze microarray dataset (the dataset) that is LSD (Problem 5). We call the linearly separable dataset as the largest Matroska. Only Revised IP-OLDF can select features naturally and find the smaller gene set or subspace (smaller Matroska) in the dataset. When we discriminate the smaller Matroska again, we can find the smaller Matroska. If we cannot find the smaller Matroska anymore, I call the last smaller Matroska as small Matroska (SM) that is linearly separable gene subspace. Because the dataset has the structure of Matroska, I develop a Matroska feature-selection method (Method 2) that finds the surprising structure of the dataset that is the disjoint union of several SMs, which are linearly separable subspaces or models. Now, we can analyze each SM very quickly because all SMs are small samples. The Theory is most suitable to analyze the datasets.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.