Abstract

Model selection criteria are widely used to identify the model that best represents the data among a set of potential candidates. Amidst the different model selection criteria, the Bayesian information criterion (BIC) and the Akaike information criterion (AIC) are the most popular and better understood. In the derivation of these indicators, it was assumed that the model’s dependent variables have already been properly identified and that the entries are not affected by significant uncertainties. These are issues that can become quite serious when investigating complex systems, especially when variables are highly correlated and the measurement uncertainties associated with them are not negligible. More sophisticated versions of this criteria, capable of better detecting spurious relations between variables when non-negligible noise is present, are proposed in this paper. Their derivation is obtained starting from a Bayesian statistics framework and adding an a priori Chi-squared probability distribution function of the model, dependent on a specifically defined information theoretic quantity that takes into account the redundancy between the dependent variables. The performances of the proposed versions of these criteria are assessed through a series of systematic simulations, using synthetic data for various classes of functions and noise levels. The results show that the upgraded formulation of the criteria clearly outperforms the traditional ones in most of the cases reported.

Highlights

  • Introduction to Model Selection CriteriaBased on Bayesian Statistics and Information TheoryPublisher’s Note: MDPI stays neutralModel Selection (MS) can be defined as the task of identifying the model best supported by the data, among a set of potential candidates [1]

  • Given the importance of the functional dependence and of the fact that the experimental case studied in the following belongs to this family, power laws are discussed first, which illustrates the methodology of the test in detail

  • Employing the upgraded model selection criteria proposed in this work, the main objective of the analysis consists of identifying, within all the possible power-law models obtained combining the predictor variables included in Equation (29), the one which best represents the τE data

Read more

Summary

Introduction

Based on Bayesian Statistics and Information Theory. Model Selection (MS) can be defined as the task of identifying the model best supported by the data, among a set of potential candidates [1]. Basically all approaches to model selection try to find a compromise between goodness of fit and complexity. At the same level of goodness of fit, simpler models, implementing some form of Occam’s razor, are preferred. The goodness of fit is assessed with the likelihood or, when this is not possible, with some metric quantifying the residuals, the distance between the model predictions, and the data. Attention will be focussed on MSC derived with the help of Bayesian statistics and information theory, since these are the ones with regard to jurisdictional claims in published maps and institutional affiliations

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call