Asymptotics of a clustering criterion for smooth distributions

Karthik Bharath,Dipak K Dey,Vladimir Pozdnyakov

doi:10.1214/13-ejs801

Abstract

We develop a clustering framework for observations from a population with a smooth probability distribution function and derive its asymptotic properties. A clustering criterion based on a linear combination of order statistics is proposed. The asymptotic behavior of the point at which the observations are split into two clusters is examined. The results obtained can then be utilized to construct an interval estimate of the point which splits the data and develop tests for bimodality and presence of clusters.

Highlights

We develop a general framework for univariate clustering based on the ideas in Hartigan (1978) for the case of observations from a population with smooth and invertible distribution function
Contrary to Hartigan’s approach, which was based on a quadratic function of the observed data, our clustering criterion function possesses the advantage of being a linear combination of order statistics—it is a combination of trimmed sums and sample quantiles
We deviate from the Hartigan’s framework and concentrate our attention on a function of the derivative of his split function. This approach permits us to obviate the existence of a finite fourth moment assumption imposed by Hartigan in the asymptotic investigation of his criterion function—a second moment assumption at the cost of an additional smoothness condition on our criterion function suffices

Summary

Introduction

We develop a general framework for univariate clustering based on the ideas in Hartigan (1978) for the case of observations from a population with smooth and invertible distribution function. One important example is modeling in continuoustime mathematical finance, wherein observations are typically increments from a continuoustime stochastic process, and have smooth distributions because of presence of Ito integral components Keeping this in mind, we deviate from the Hartigan’s framework and concentrate our attention on a function of the derivative of his split function. Holzmann and Vollmer (2008) proposed a parametric test for bimodality based on the likelihood principle by using two-component mixtures Their method was applied to investigate the modal structure of the cross-sectional distribution of per-capita log GDP across EU regions.

Assumptions

Empirical Cross-over Function and Empirical Split Point

Main Results

Numerical Verification

An Example

Discussion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Electronic Journal of Statistics	Publication Date: Jan 1, 2013
Citations: 9	License type: cc-by

R Discovery Prime

R Discovery Prime

Asymptotics of a clustering criterion for smooth distributions

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronic Journal of Statistics

Lead the way for us

Similar Papers

Efficient Estimation of Parameters of the Extreme Value Distribution
S R Saha ... S Mandal
Sankhya B | VOL. 76
S R Saha, et. al.S R Saha ... S Mandal
30 Nov 2013
Sankhya B | VOL. 76

Edgeworth Expansions for Linear Combinations of Order Statistics with Smooth Weight Functions
R Helmers
The Annals of Statistics | VOL. 8
R HelmersR Helmers
01 Nov 1980
The Annals of Statistics | VOL. 8

An invariance principle for linear combinations of order statistics
Pranab Sen
Zeitschrift f�r Wahrscheinlichkeitstheorie und Verwandte Gebiete | VOL. 42
Pranab SenPranab Sen
01 Jan 1978
Zeitschrift f�r Wahrscheinlichkeitstheorie und Verwandte Gebiete | VOL. 42

A Berry-Esseen Theorem for Linear Combinations of Order Statistics
R Helmers
The Annals of Probability | VOL. 9
R HelmersR Helmers
01 Apr 1981
The Annals of Probability | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Asymptotics of a clustering criterion for smooth distributions

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronic Journal of Statistics