Abstract
This article introduces a new formulation of, and method of computation for, the projection median. Additionally, we explore its behaviour on a specific bivariate set up, providing the first theoretical result on form of the influence curve for the projection median, accompanied by numerical simulations. Via new simulations we comprehensively compare our performance with an established method for computing the projection median, as well as other existing multivariate medians. We focus on answering questions about accuracy and computational speed, whilst taking into account the underlying dimensionality. Such considerations are vitally important in situations where the data set is large, or where the operations have to be repeated many times and some well-known techniques are extremely computationally expensive. We briefly describe our associated R package that includes our new methods and novel functionality to produce animated multidimensional projection quantile plots, and also exhibit its use on some high-dimensional data examples.
Highlights
Overview of multivariate mediansThe median is an estimator of location that is robust, i.e. not heavily influenced by outlying values, which are, loosely speaking, points that are far from the main body of the data
We have introduced a new method, yamm, to compute the projection median, for data in Rn with n 2
We have proved the theoretical equivalence of yamm and the projection median
Summary
The median is an estimator of location that is robust, i.e. not heavily influenced by outlying values, which are, loosely speaking, points that are far from the main body of the data. The univariate population median functional M(F) is. There are several equivalent definitions of the univariate median that all yield same unique value of true median μ for a distribution F with a bounded and continuous density f(μ) at μ. Several different multivariate median concepts have been developed that retain some characteristics of the univariate median. Oja’s median [5] provides an alternative to the spatial median, but it is known to be more computationally expensive than other choices. We briefly review some of them here not least as we use them later in our simulation study
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have