Abstract

<p>This paper focuses on the robust classification procedures in two group discriminant analysis with multivariate binary variables. A normal distribution based data set is generated using the R-software statistical analysis system 2.15.3 using Barlett’s approximation to chi-square, the data set was found to be homogenous and was subjected to five linear classifiers namely: maximum likelihood discriminant function, fisher’s linear discriminant function, likelihood ratio function, full multinomial function and nearest neighbour function rule. To judge the performance of these procedures, the apparent error rates for each procedure are obtained for different sample sizes. The results obtained ranked the procedures as follows: fisher’s linear discriminant function, maximum likelihood, full multinomial, likelihood function and nearest neigbour function.</p>

Highlights

  • This paper focuses on the robust classification procedures in two group discriminant analysis with multivariate binary variables

  • It is well documented that parametric statistical methods such as Fisher‟s linear discriminant function (LDF) (1936) and Smith‟s quadratic discriminant function (QDF), Smith (1947) may yield poor classification results if the assumption of multivariate normally distributed attributes is violated to a significant extent (McLachlan 1992, Huberty 1994)

  • We consider a classical problem of discriminant analysis: an individual is to be allocated to one k distinct classes w1,...wc, whose members are described by an r-component vector of binary variables X= (x1,x2...xr)

Read more

Summary

Introduction

A considerable body of research has accumulated on classification analysis, with its usefulness demonstrated in various fields, including engineering, medical and social sciences, economics, marketing, finance and management (Anderson 1972, McLachlan 1992, Joachimsthaler and Stam 1988, 1990, Ragsdale and Stam 1992, Huberty 1994, Onyeagu, 2003, Okonkwo 2011, Ekezie 2012, Egbo, Onyeagu and Ekezie 2014). A number of the statistical classification methods are based on distance measures, some involve probability density functions and variance covariance and have a Bayes decision theoretic probabilistic interpretation, while others have a geometric interpretation only. We consider a classical problem of discriminant analysis: an individual is to be allocated to one k distinct classes w1,...wc, whose members are described by an r-component vector of binary variables X= (x1,x2...xr). Most of the studies that compared non-normal classification methods with normality-based methods for various different data conditions have assumed equal misclassification costs across groups. The purpose of the current study is to establish guidelines for choosing an appropriate classification method if the problem at hand is characterized by Bernoulli multivariate data. This study is limited to the two-group classification problem

Classification Procedures
Testing Adequacy of Discriminant Coefficient
Probability of Misclassification
Simulation Experiments and Results
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call