Abstract

This is an up-to-date introduction to and overview of the Minimum Description Length (MDL) Principle, a theory of inductive inference that can be applied to general problems in statistics, machine learning and pattern recognition. While MDL was originally based on data compression ideas, this introduction can be read without any knowledge thereof. It takes into account all major developments since 2007, the last time an extensive overview was written. These include new methods for model selection and averaging and hypothesis testing, as well as the first completely general definition of MDL estimators. Incorporating these developments, MDL can be seen as a powerful extension of both penalized likelihood and Bayesian approaches, in which penalization functions and prior distributions are replaced by more general luckiness functions, average-case methodology is replaced by a more robust worst-case approach, and in which methods classically viewed as highly distinct, such as AIC versus BIC and cross-validation versus Bayes can, to a large extent, be viewed from a unified perspective.

Highlights

  • The Minimum Description Length (MDL) Principle [Rissanen, 1978, 1989, Barron et al, 1998, Grünwald, 2007] is a theory of inductive inference that can be applied to general problems in statistics, machine learning and pattern recognition

  • Here we present, for the first time, the MDL Principle without resorting to information theory: all the material can be understood without any knowledge of data compression, which should make it a much easier read for statisticians and machine learning researchers novel to MDL

  • Over the last 10 years, there have been exciting developments — some of them very recent — which mostly resolve these issues. Incorporating these developments, MDL can be seen as a powerful extension of both penalized likelihood and Bayesian approaches, in which penalization functions and prior distributions are replaced by more general luckiness functions, average-case methodology is replaced by a more robust worst-case approach, and in which methods classically viewed as highly distinct, such as AIC vs BIC and crossvalidation vs Bayes can, to some extent, be viewed from a unified perspective; as such, this paper should be of interest to researchers working on the foundations of statistics and machine learning

Read more

Summary

Introduction

The Minimum Description Length (MDL) Principle [Rissanen, 1978, 1989, Barron et al, 1998, Grünwald, 2007] is a theory of inductive inference that can be applied to general problems in statistics, machine learning and pattern recognition. Over the last 10 years, there have been exciting developments — some of them very recent — which mostly resolve these issues Incorporating these developments, MDL can be seen as a powerful extension of both penalized likelihood and Bayesian approaches, in which penalization functions and prior distributions are replaced by more general luckiness functions, average-case methodology is replaced by a more robust worst-case approach, and in which methods classically viewed as highly distinct, such as AIC vs BIC and crossvalidation vs Bayes can, to some extent, be viewed from a unified perspective; as such, this paper should be of interest to researchers working on the foundations of statistics and machine learning. We introduce them in a concise yet self-contained way, including substantial underlying motivation, in Section 2, incorporating the extensions to and new insights into these basic building blocks that have been gathered over the last 10 years These include more general formulations of arguably the most fundamental universal code, the Normalized Maximum Likelihood (NML) Distribution, including faster ways to calculate it as well. We use θto denote more general estimators, and θv to denote what we call the MDL estimator with luckiness function v, see (5)

The Fundamental Concept
Motivation
Asymptotic Expansions
Unifying Model Selection and Estimation
Log-Loss Prediction and Universal Distributions
The Luckiness Function
The Switch Distribution and the AIC-BIC Dilemma
Hypothesis Testing
Graphical Models
Factorized NML and Variants
Asymptotic Expansions for Graphical Models
Latent Variable and Irregular Models
Frequentist Convergence of MDL and Its Implications
Frequentist Convergence of MDL Estimation
From MDL to Lasso
Misspecification
PAC-MDL Bounds and Deep Learning
Concluding Remarks
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.