Abstract

Considering the selection of frequency histograms, we propose a modification of Akaike's Information Criterion that avoids overfitting, even when the sample size is small. We call this correction an over-penalization procedure. We emphasize that the principle of unbiased risk estimation for model selection can indeed be improved by addressing excess risk deviations in the design of the penalization procedure. On the theoretical side, we prove sharp oracle inequalities for the Kullback-Leibler divergence. These inequalities are valid with positive probability for any sample size and include the estimation of unbounded log-densities. Along the proofs, we derive several analytical lemmas related to the Kullback-Leibler divergence, as well as concentration inequalities, that are of independent interest. In a simulation study, we also demonstrate state-of-the-art performance of our over-penalization criterion for bin size selection, in particular outperforming AICc procedure.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call