Abstract

We propose a simple yet powerful framework for modeling integer-valued data, such as counts, scores, and rounded data. The data-generating process is defined by Simultaneously Transforming and Rounding (STAR) a continuous-valued process, which produces a flexible family of integer-valued distributions capable of modeling zero-inflation, bounded or censored data, and over- or underdispersion. The transformation is modeled as unknown for greater distributional flexibility, while the rounding operation ensures a coherent integer-valued data-generating process. An efficient MCMC algorithm is developed for posterior inference and provides a mechanism for adaptation of successful Bayesian models and algorithms for continuous data to the integer-valued data setting. Using the STAR framework, we design a new Bayesian Additive Regression Tree model for integer-valued data, which demonstrates impressive predictive distribution accuracy for both synthetic data and a large healthcare utilization dataset. For interpretable regression-based inference, we develop a STAR additive model, which offers greater flexibility and scalability than existing integer-valued models. The STAR additive model is applied to study the recent decline in Amazon river dolphins.

Highlights

  • MSC 2010 subject classifications: Primary 62F15, 62G08; secondary 62M20

  • We focus on conditionally Gaussian regression models, but the star framework applies more broadly

  • Post hoc rounding ignores the discrete nature of the data in model-fitting, and introduces a disconnect between the fitted model and the model used for prediction. star clearly avoids this issue, and maintains the benefits of using well-known models for continuous data while producing a coherent integer-valued predictive distribution

Read more

Summary

Simultaneously transforming and rounding

The star framework of (1)-(2) defines an integervalued process for y, in which the transformation g may be modeled as unknown for greater distributional flexibility. The transformation g, which we model as unknown, endows the integer-valued process y with greater distributional flexibility, yet leaves model (3) unchanged. This construction allows seamless integration of Bayesian models and algorithms for continuous data of the common form (3) into the integer-valued star framework, with efficient posterior inference available via a general MCMC algorithm (Section 2.3). Star clearly avoids this issue, and maintains the benefits of using well-known models for continuous data while producing a coherent integer-valued predictive distribution Post hoc rounding ignores the discrete nature of the data in model-fitting, and introduces a disconnect between the fitted model and the model used for prediction. star clearly avoids this issue, and maintains the benefits of using well-known models for continuous data while producing a coherent integer-valued predictive distribution

The transformation g
Model properties
Posterior inference
Regression modeling with STAR
Additive models
Bayesian additive regression trees
Simulation studies
Linear mean functions
Nonlinear mean functions
Predicting the demand for healthcare utilization
Modeling the decline in Amazon river dolphins
Evaluating point accuracy for synthetic data
Model and MCMC diagnostics for the dolphins data
Results
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call