Simplicial and Minimal-Variance Distances in Multivariate Data Analysis

Jonathan Gillard,Anatoly Zhigljavsky,Emily O’Riordan

doi:10.1007/s42519-021-00227-7

Jonathan Gillard, Anatoly Zhigljavsky + Show 1 more

Open Access

https://doi.org/10.1007/s42519-021-00227-7

Copy DOI

Journal: Journal of Statistical Theory and Practice	Publication Date: Jan 21, 2022
Citations: 1	License type: open-access

Affiliation: Cardiff University

Abstract

In this paper, we study the behaviour of the so-called k-simplicial distances and k-minimal-variance distances between a point and a sample. The family of k-simplicial distances includes the Euclidean distance, the Mahalanobis distance, Oja’s simplex distance and many others. We give recommendations about the choice of parameters used to calculate the distances, including the size of the sub-sample of simplices used to improve computation time, if needed. We introduce a new family of distances which we call k-minimal-variance distances. Each of these distances is constructed using polynomials in the sample covariance matrix, with the aim of providing an alternative to the inverse covariance matrix, that is applicable when data is degenerate. We explore some applications of the considered distances, including outlier detection and clustering, and compare how the behaviour of the distances is affected for different parameter choices.

Highlights

The Mahalanobis distance is one of the most useful tools in multivariate data science, underpinning a huge variety of practical data analysis methods
We explore the choice of the parameter k, and show that k can be relatively low to produce good results, making the k-minimal-variance distance a quick and viable alternative to the Mahalanobis distance
We prove the following theorem comparing the variance of the squared Euclidean distance, Mahalanobis distance and k-simplicial distance with k = 2 and δ = 2

Summary

Introduction

The Mahalanobis distance is one of the most useful tools in multivariate data science, underpinning a huge variety of practical data analysis methods. This distance measures the proximity of a point x ∈ Rd to a d-dimensional set of points X = {x1, . It was introduced in Mahalanobis [27]. The Mahalanobis distance corresponds to the Euclidean distance in the standardized space where variables are uncorrelated

Page 2 of 30

Page 4 of 30

Page 6 of 30

Page 8 of 30

Choosing k in the k-Simplicial Distance

Page 10 of 30

Numerical Computation of k-Simplicial Distances Using Sub-Sampling

Page 12 of 30

Outlier Labelling Example

Page 16 of 30

Page 18 of 30

Efficiency of k-Minimal-Variance Distances Compared to k-Simplicial Distances

Page 20 of 30

Page 22 of 30

Page 24 of 30

Conclusion

Page 26 of 30

Page 28 of 30

Findings

Page 30 of 30

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Simplicial and Minimal-Variance Distances in Multivariate Data Analysis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Statistical Theory and Practice

Lead the way for us

Similar Papers

Product of Two Random Matrices
Zhidong Bai ... Jack W Silverstein
-
Zhidong Bai, et. al.Zhidong Bai ... Jack W Silverstein
28 Nov 2009
28 Nov 2009

Selecting a shrinkage parameter in structural equation modeling with a near singular covariance matrix by the GIC minimization method
Ami Kamada ... Hirokazu Yanagihara
Hiroshima Mathematical Journal | VOL. 44
Ami Kamada, et. al.Ami Kamada ... Hirokazu Yanagihara
01 Nov 2014
Hiroshima Mathematical Journal | VOL. 44

A robust method of estimating covariance matrix in multivariate data analysis
...
-
, et. al. ...
31 Jan 2010
31 Jan 2010

Fuzzy Partition Clustering Algorithms Based on Alternative Mahalanobis Distances

-

01 Dec 2008
01 Dec 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Simplicial and Minimal-Variance Distances in Multivariate Data Analysis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Statistical Theory and Practice