Abstract

The recognition performance of the sample Mahalanobis distance (SMD) deteriorates as the number of learning samples decreases. Therefore, it is important to correct the SMD for a population Mahalanobis distance (PMD) such that it becomes equivalent to the case of infinite learning samples. In order to reduce the computation time and cost for this main purpose, this paper presents a correction method that does not require the estimation of the population eigenvalues or eigenvectors of the covariance matrix. In short, this method only requires the sample eigenvalues of the covariance matrix, number of learning samples, and dimensionality to correct the SMD for the PMD. This method involves the summation of the SMD’s principal components (each of which is divided by its expectation obtained using the delta method), Lawley’s bias estimation, and the variances of the sample eigenvectors. A numerical experiment demonstrates that this method works well for various cases of learning sample number, dimensionality, population eigenvalues sequence, and non-centrality. The application of this method also shows improved performance of estimating a Gaussian mixture model using the expectation–maximization algorithm.

Highlights

  • T 2 = (y − x) S−1(y − x), (1)The Mahalanobis distance (MD) has been used for statistical learning as a basic discriminator under multidimensional normal distributions [1,2]

  • In order to correct the sample Mahalanobis distance (SMD) or T 2 for population Mahalanobis distance (PMD) or D2, this paper presents a correction method that requires only the sample eigenvalues of the sample covariance matrix, dimensionality p, and number of learning samples n

  • By using the delta method in statistics, Lawley’s bias estimation of the sample eigenvalues, and the variances of the sample eigenvectors, the approximated expectation can be obtained by using the sample eigenvalues, p, and n

Read more

Summary

Introduction

The Mahalanobis distance (MD) has been used for statistical learning as a basic discriminator under multidimensional normal distributions [1,2]. The distribution of the regularized T 2 obtained from a numerical experiment is unknown, so performing a theoretical analysis of the regularized T 2 is difficult Another method involves estimating the unknown Σ using. [24,25,26,27], and information theoretic metric learning tr(A) − log det A [28] All of these could potentially provide better performance than the MD; they require A to be optimized iteratively using a complicated technique, and their distances with the optimized A have an unknown distribution. This paper proposes a method for correcting T 2 for D2 using only the sample eigenvalues of S to define T 2 with dimensionality p and number of the learning samples n. “Appendix A” lists the common notations used, and “Appendix B” shows the procedures of the simulations in the paper in details

Theory
Background
Proposed models ti2
Monte Carlo simulation of MDs
Example of application
Conclusion
Compliance with ethical standards
A Common notations p
EM algorithm
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call