Comparison of performance of five common classifiers represented as boundary methods: Euclidean Distance to Centroids, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Learning Vector Quantization and Support Vector Machines, as dependent on data structure

Sarah J Dixon,Richard G Brereton

doi:10.1016/j.chemolab.2008.07.010

Abstract

Five methods for discrimination are described, namely Euclidean Distance to centroids (EDC), Linear Discriminant Analysis (LDA) (based on the Mahalanobis distance and pooled variance covariance matrix), Quadratic Discriminant Analysis (QDA) (based on the Mahalanobis distance and individual class variance covariance matrix — non-Bayesian form), Learning Vector Quantization (LVQ) and Support Vector Machines (SVMs) (using soft boundaries and Radial Basis Functions), and illustrated graphically as boundary methods. The performance of each method was determined using four synthetic datasets each consisting of 200 samples half belonging to one of two classes, and a further two synthetic datasets containing 400 samples, again equally split between the two classes. In datasets 1 to 3, five variables were distributed multinormally, in dataset 1 the classes are distributed roughly circularly but with a significant degree of overlap, in dataset 2, the distribution is in elongated hyperellipsoids with small overlap, and in dataset 3 there is a region of complete overlap between classes. In dataset 4 two variables are distributed in a crescent shape. In datasets 5 and 6, 100 variables were generated from multinormal populations, some of which were potential discriminators, however a large proportion of the variables were designed to be uninformative. The methods were optimised using a training set and their performance evaluated using a test set: this was repeated 100 times for different test and training set splits. The average % correctly classified was computed for each class and model, as well as the model stability for each sample (the proportion of times the sample is classified into the same group over all 100 iterations). The conclusions are that the performance of the classifiers depends very much on the distribution of data. Approaches such as LVQ and SVMs that try to determine complex boundaries perform best when the data is not normally distributed such as in dataset 4, but can be prone to overfitting otherwise. QDA tends to perform best on multinormal data although it can be influenced by non-discriminative variables which show a difference in variance. It is recommended to look at the data structure prior to model building to determine the optimal type of model.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Comparison of performance of five common classifiers represented as boundary methods: Euclidean Distance to Centroids, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Learning Vector Quantization and Support Vector Machines, as dependent on data structure

Abstract

Talk to us

Similar Papers

More From: Chemometrics and Intelligent Laboratory Systems

Lead the way for us

Journal: Chemometrics and Intelligent Laboratory Systems	Publication Date: Aug 3, 2008
Citations: 189

Similar Papers

Discriminant Analysis for Radar Signal Classification
Shanzeng Guo ... Hannah Tracey
IEEE Transactions on Aerospace and Electronic Systems | VOL. 56
Shanzeng Guo, et. al.Shanzeng Guo ... Hannah Tracey
14 Jan 2020
IEEE Transactions on Aerospace and Electronic Systems | VOL. 56

Motor Oil Classification Using Color Histograms and Pattern Recognition Techniques.
Shiva Ahmadi ... Biuck Habibi
Journal of AOAC International | VOL. 101
Shiva Ahmadi, et. al.Shiva Ahmadi ... Biuck Habibi
01 Nov 2018
Journal of AOAC International | VOL. 101

Performance of the supervised learning algorithms in sex estimation of the proximal femur: A comparative study in contemporary Egyptian and Turkish samples
Mennattallah H Attia ... Francisco Curate
Science & Justice | VOL. 62
Mennattallah H Attia, et. al.Mennattallah H Attia ... Francisco Curate
08 Mar 2022
Science & Justice | VOL. 62

Chemometrics for Pattern Recognition
Richard G Brereton
-
Richard G BreretonRichard G Brereton
10 Jul 2009
10 Jul 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparison of performance of five common classifiers represented as boundary methods: Euclidean Distance to Centroids, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Learning Vector Quantization and Support Vector Machines, as dependent on data structure

Abstract

Talk to us

Similar Papers

More From: Chemometrics and Intelligent Laboratory Systems