Abstract

We consider the segmentation problem of univariate distributions from the exponential fam- ily with multiple parameters. In segmentation, the choice of the number of segments remains a dicult issue due to the discrete nature of the change-points. In this general exponential family distribution framework, we propose a penalized log-likelihood estimator where the penalty is inspired by papers of L. Birg e and P. Massart. The resulting estimator is proved to satisfy an oracle inequality. We then further study the particular case of categorical variables by comparing the values of the key constants when derived from the specication of our general approach and when obtained by working directly with the characteristics of this distribution. Finally, a simulation study is conducted to assess the performance of our criterion for the exponential distribution, and an application on real data modeled by the categorical distribution is provided.

Highlights

  • Segmentation, the partition of a profile into segments of homogeneous distribution, is a very useful tool for the modelization and interpretation of complex data

  • We have proposed a general approach to the selection of the number of segments in the segmentation framework where the data can be modeled using a distribution from the exponential family

  • The log-partition function and its many properties are instrumental in the computation of the bounds and the derivation of the oracle inequality

Read more

Summary

Introduction

Segmentation, the partition of a profile into segments of homogeneous distribution, is a very useful tool for the modelization and interpretation of complex data. A crucial step, which is the main difficulty in segmentation approaches, is the choice of the number of segments K To this end, a huge effort has been made in the last two decades to derive estimators of K and their properties. A huge effort has been made in the last two decades to derive estimators of K and their properties If, in this context, almost all methods for choosing the number of segments can be seen as penalized-likelihood approaches (Akaike Information Criterion, [1], Bayes Information Criterion [38], Integrated Completed Likelihood [36], etc), we and other authors (see for instance [29, 8, 39]) have previously emphasized how crucial the choice of the penalty function is in contexts such as segmentation where the size of the collection of models grows with the size of the data

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call