Abstract

Mahalanobis distance may be used as a measure of the disparity between an individual’s profile of scores and the average profile of a population of controls. The degree to which the individual’s profile is unusual can then be equated to the proportion of the population who would have a larger Mahalanobis distance than the individual. Several estimators of this proportion are examined. These include plug-in maximum likelihood estimators, medians, the posterior mean from a Bayesian probability matching prior, an estimator derived from a Taylor expansion, and two forms of polynomial approximation, one based on Bernstein polynomial and one on a quadrature method. Simulations show that some estimators, including the commonly-used plug-in maximum likelihood estimators, can have substantial bias for small or moderate sample sizes. The polynomial approximations yield estimators that have low bias, with the quadrature method marginally to be preferred over Bernstein polynomials. However, the polynomial estimators sometimes yield infeasible estimates that are outside the 0–1 range. While none of the estimators are perfectly unbiased, the median estimators match their definition; in simulations their estimates of the proportion have a median error close to zero. The standard median estimator can give unrealistically small estimates (including 0) and an adjustment is proposed that ensures estimates are always credible. This latter estimator has much to recommend it when unbiasedness is not of paramount importance, while the quadrature method is recommended when bias is the dominant issue.

Highlights

  • The Mahalanobis distance is frequently used in multivariate analysis as a statistical measure of distance between a vector of scores for a single case and the mean vector of the underlying population or a sample of data. It was developed by Mahalanobis (1936) as a distance measure that incorporates the correlation between different scores

  • We propose some alternative estimators of P and compare them in terms of their bias and root mean square error in the simulation study

  • The third estimator in this group is a Bayesian estimator; it is based on the idea of probability matching priors and is denoted by PBY. We propose another two new estimators of P based on the mean of the non-centrality parameter of a non-central F distribution; these are denoted by PM and PR

Read more

Summary

Introduction

The Mahalanobis distance is frequently used in multivariate analysis as a statistical measure of distance between a vector of scores for a single case and the mean vector of the underlying population or a sample of data. The commonly used estimates of P are the p-value computed from the chi-square distribution of the sample Mahalanobis index, or the p-value from the central F distribution associated with Hotelling’s T 2 test. In remote sensing image analysis, Foody (2006) was interested in measuring the closeness of an image pixel to a single class centroid He used the Mahalanobis distance and converted the calculated Mahalanobis distance, of a particular image pixel from a specified class centroid, to its associated p-value from the chi-square distribution.

Two plug-in maximum likelihood estimators of P
Classical estimator of the median
Modified estimator of the median
Bayesian probability matching
Estimators based on the mean of λ
An estimator based on a Taylor expansion
Estimators based on polynomial approximations
Bernstein polynomials approximation
Quadrature polynomial approximation
Simulation results
Ranges of estimates
Performances as measured by absolute error
Findings
Concluding comments

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.