Abstract

We propose a kernel function for ordered categorical data that overcomes certain limitations present in ordered kernel functions that have appeared in the literature on the estimation of probability mass functions for multinomial ordered data. Some of these limitations arise from assumptions made about the support of the random variable that may be at odds with the data at hand. Furthermore, many existing ordered kernel functions lack a particularly appealing property, namely the ability to deliver discrete uniform probability estimates for some value of the smoothing parameter. To overcome these limitations, we propose an asymmetric empirical support kernel function that adapts to the data at hand and possesses certain desirable features. In particular, there are no difficulties arising from zero counts caused by gaps in the data while it encompasses both the empirical proportions and the discrete uniform probabilities at the lower and upper boundaries of the smoothing parameter. We propose using likelihood and least squares cross-validation for smoothing parameter selection, and study the asymptotic behaviour of these data-driven methods. We use Monte Carlo simulations to examine the finite sample performance of the proposed estimator and we also provide a simple empirical example to illustrate the usefulness of the proposed estimator in applied settings.

Highlights

  • In multinomial discrete support random variable settings, it is common to encounter situations in which the support contains only a handful of values, and such values may contain gaps (e.g., {0, 1, 2, 5}). When such data are of the ordered type, using a kernel function that recognizes order present in data can lead to improved accuracy relative to kernel functions that ignore order

  • Unlike Hall (1987), who considered likelihood cross-validation in a density estimation context and demonstrated how its asymptotic properties are profoundly influenced by tail properties of the kernel function and of the unknown density function, our approach is immune to this phenomenon because we explicitly treat our problem as one having finite support, there is no “tail” in the sense of Hall (1987)

  • Ryzin (1981) and Ahmad and Cerrito (1994), but they presume that the support is the set of all consecutive integers which may not be the case for the data at hand, while there is no value of the smoothing parameter for which the kernel function is the discrete uniform (the same goes for the ordered kernels proposed by Rajagopalan and Lall (1995), Chu et al (2017) and others)

Read more

Summary

Introduction

In multinomial discrete support random variable settings, it is common to encounter situations in which the support contains only a handful of values, and such values may contain gaps (e.g., {0, 1, 2, 5}). When such data are of the ordered type, using a kernel function that recognizes order present in data can lead to improved accuracy relative to kernel functions that ignore order (e.g., binary unordered counting kernel functions). The proposed approach exhibits better finite sample performance than estimators based on kernel functions that ignore order present in the data and than the empirical proportions themselves. Unlike Hall (1987), who considered likelihood cross-validation in a density estimation context and demonstrated how its asymptotic properties are profoundly influenced by tail properties of the kernel function and of the unknown density function, our approach is immune to this phenomenon because we explicitly treat our problem as one having finite support, there is no “tail” in the sense of Hall (1987)

Background
Data-driven Smoothing Parameter Selection Methods
Monte Carlo Simulation
The Multivariate Case
Multivariate Data-Driven Smoothing Parameter Selection
Empirical Illustration
Summary
Some Useful Lemmas
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call