Abstract

Beta regression models are a class of supervised learning tools for regression problems with univariate and limited response. Current fitting procedures for beta regression require variable selection based on (potentially problematic) information criteria. We propose model selection criteria that take into account the leverage, residuals, and influence of the observations, both to systematic linear and nonlinear components. To that end, we propose a Predictive Residual Sum of Squares (PRESS)-like machine learning tool and a prediction coefficient, namely P 2 statistic, as a computational procedure. Monte Carlo simulation results on the finite sample behavior of prediction-based model selection criteria P 2 are provided. We also evaluated two versions of the R 2 criterion. Finally, applications to real data are presented. The new criterion proved to be crucial to choose models taking into account the robustness of the maximum likelihood estimation procedure in the presence of influential cases.

Highlights

  • The class of nonlinear beta regression models was proposed by [1] and extended to situations in which the data include zeros and/or ones by [2,3]

  • The model selection is a crucial step in data analysis, since all inference performance is based on the selected model. [6] evaluated the behavior of different model selection criteria in a beta regression model, such as the Akaike Information Criterion (AIC) [7], Schwarz Bayesian Criterion (SBC) [8] and various approaches based on pseudo-R2

  • In this context, [9] proposed the PRESS (Predictive Residual Sum of Squares) statistic, which can be used as a measure of the predictive power of a model. [10] proposed a coefficient of prediction based on PRESS, namely P2 that is similar to the R2 approach

Read more

Summary

Introduction

The class of nonlinear beta regression models was proposed by [1] and extended to situations in which the data include zeros and/or ones by [2,3]. [6] evaluated the behavior of different model selection criteria in a beta regression model, such as the Akaike Information Criterion (AIC) [7], Schwarz Bayesian Criterion (SBC) [8] and various approaches based on pseudo-R2 It is common for models selected by the usual selection criteria to present poorly fitted or influential observations. The simulation and application data set results showed that small values of the new criterion are an indication that the robustness of the maximum likelihood estimation procedure of the model in the presence of influential points is worthy of further investigation. This information could not be accessed by usual selection criteria. The best machine learning strategy is to use the three criteria discussed here to choose the best model, once each one holds on different information

P2 Criterion
Simulation Study
Linear Setting
Nonlinear Setting
Fluid Catalytic Cracking
Simultaneity Factor
Conclusions and Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call