Abstract
We propose a simple yet powerful framework for modeling integer-valued data, such as counts, scores, and rounded data. The data-generating process is defined by Simultaneously Transforming and Rounding (STAR) a continuous-valued process, which produces a flexible family of integer-valued distributions capable of modeling zero-inflation, bounded or censored data, and over- or underdispersion. The transformation is modeled as unknown for greater distributional flexibility, while the rounding operation ensures a coherent integer-valued data-generating process. An efficient MCMC algorithm is developed for posterior inference and provides a mechanism for adaptation of successful Bayesian models and algorithms for continuous data to the integer-valued data setting. Using the STAR framework, we design a new Bayesian Additive Regression Tree model for integer-valued data, which demonstrates impressive predictive distribution accuracy for both synthetic data and a large healthcare utilization dataset. For interpretable regression-based inference, we develop a STAR additive model, which offers greater flexibility and scalability than existing integer-valued models. The STAR additive model is applied to study the recent decline in Amazon river dolphins.
Highlights
MSC 2010 subject classifications: Primary 62F15, 62G08; secondary 62M20
We focus on conditionally Gaussian regression models, but the star framework applies more broadly
Post hoc rounding ignores the discrete nature of the data in model-fitting, and introduces a disconnect between the fitted model and the model used for prediction. star clearly avoids this issue, and maintains the benefits of using well-known models for continuous data while producing a coherent integer-valued predictive distribution
Summary
The star framework of (1)-(2) defines an integervalued process for y, in which the transformation g may be modeled as unknown for greater distributional flexibility. The transformation g, which we model as unknown, endows the integer-valued process y with greater distributional flexibility, yet leaves model (3) unchanged. This construction allows seamless integration of Bayesian models and algorithms for continuous data of the common form (3) into the integer-valued star framework, with efficient posterior inference available via a general MCMC algorithm (Section 2.3). Star clearly avoids this issue, and maintains the benefits of using well-known models for continuous data while producing a coherent integer-valued predictive distribution Post hoc rounding ignores the discrete nature of the data in model-fitting, and introduces a disconnect between the fitted model and the model used for prediction. star clearly avoids this issue, and maintains the benefits of using well-known models for continuous data while producing a coherent integer-valued predictive distribution
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.