Abstract

Unbiased cross-validation (UCV) is a commonly-used method to calculate the optimal bandwidth for the kernel density estimator (KDE), which estimates the underlying probability density function (PDF) for a given data set. Since the UCV method was proposed, there have been few studies that have pointed out its instability when determining the KDE bandwidth. Following the principle of stability improvement, this paper presents a novel ensemble UCV based KDE (EUCV-KDE), which determines the expectation of an estimated PDF using an ensemble of data-block based UCVs rather than a single data-point based UCV. To derive the optimal bandwidth, a novel objective function is designed for EUCV-KDE by considering the empirical and structural risk of KDE together. We validate the rationality and effectiveness of EUCV-KDE on 10 probability distributions. The experimental results show that EUCV-KDE is convergent as the number of data-block based UCVs increases and can obtain a more stable and better prediction performance than the classical UCV-KDE and the revisited cross-validation (RCV) based KDE (RCV-KDE). In addition, a real-world application based on UK climate data is provided to further validate the effectiveness of EUCV-KDE by determining the optimal bandwidth for Nadaraya-Watson kernel regression estimator.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call