Minimum description length revisited

Peter Grünwald,Teemu Roos

doi:10.1142/s2661335219300018

Abstract

This is an up-to-date introduction to and overview of the Minimum Description Length (MDL) Principle, a theory of inductive inference that can be applied to general problems in statistics, machine learning and pattern recognition. While MDL was originally based on data compression ideas, this introduction can be read without any knowledge thereof. It takes into account all major developments since 2007, the last time an extensive overview was written. These include new methods for model selection and averaging and hypothesis testing, as well as the first completely general definition of MDL estimators. Incorporating these developments, MDL can be seen as a powerful extension of both penalized likelihood and Bayesian approaches, in which penalization functions and prior distributions are replaced by more general luckiness functions, average-case methodology is replaced by a more robust worst-case approach, and in which methods classically viewed as highly distinct, such as AIC versus BIC and cross-validation versus Bayes can, to a large extent, be viewed from a unified perspective.

Highlights

The Minimum Description Length (MDL) Principle [Rissanen, 1978, 1989, Barron et al, 1998, Grünwald, 2007] is a theory of inductive inference that can be applied to general problems in statistics, machine learning and pattern recognition
Here we present, for the first time, the MDL Principle without resorting to information theory: all the material can be understood without any knowledge of data compression, which should make it a much easier read for statisticians and machine learning researchers novel to MDL
Over the last 10 years, there have been exciting developments — some of them very recent — which mostly resolve these issues. Incorporating these developments, MDL can be seen as a powerful extension of both penalized likelihood and Bayesian approaches, in which penalization functions and prior distributions are replaced by more general luckiness functions, average-case methodology is replaced by a more robust worst-case approach, and in which methods classically viewed as highly distinct, such as AIC vs BIC and crossvalidation vs Bayes can, to some extent, be viewed from a unified perspective; as such, this paper should be of interest to researchers working on the foundations of statistics and machine learning

Summary

Introduction

The Minimum Description Length (MDL) Principle [Rissanen, 1978, 1989, Barron et al, 1998, Grünwald, 2007] is a theory of inductive inference that can be applied to general problems in statistics, machine learning and pattern recognition. Over the last 10 years, there have been exciting developments — some of them very recent — which mostly resolve these issues Incorporating these developments, MDL can be seen as a powerful extension of both penalized likelihood and Bayesian approaches, in which penalization functions and prior distributions are replaced by more general luckiness functions, average-case methodology is replaced by a more robust worst-case approach, and in which methods classically viewed as highly distinct, such as AIC vs BIC and crossvalidation vs Bayes can, to some extent, be viewed from a unified perspective; as such, this paper should be of interest to researchers working on the foundations of statistics and machine learning. We introduce them in a concise yet self-contained way, including substantial underlying motivation, in Section 2, incorporating the extensions to and new insights into these basic building blocks that have been gathered over the last 10 years These include more general formulations of arguably the most fundamental universal code, the Normalized Maximum Likelihood (NML) Distribution, including faster ways to calculate it as well. We use θto denote more general estimators, and θv to denote what we call the MDL estimator with luckiness function v, see (5)

The Fundamental Concept

Motivation

Asymptotic Expansions

Unifying Model Selection and Estimation

Log-Loss Prediction and Universal Distributions

The Luckiness Function

The Switch Distribution and the AIC-BIC Dilemma

Hypothesis Testing

Graphical Models

Factorized NML and Variants

Asymptotic Expansions for Graphical Models

Latent Variable and Irregular Models

Frequentist Convergence of MDL and Its Implications

Frequentist Convergence of MDL Estimation

From MDL to Lasso

Misspecification

PAC-MDL Bounds and Deep Learning

Concluding Remarks

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Mathematics for Industry	Publication Date: Dec 1, 2019
Citations: 35	License type: cc-by

R Discovery Prime

R Discovery Prime

Minimum description length revisited

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Mathematics for Industry

Lead the way for us

Similar Papers

The Minimum Description Length Principle
Peter D Grünwald
-
Peter D GrünwaldPeter D Grünwald
23 Mar 2007
23 Mar 2007

An analysis of the difference of code lengths between two-step codes based on MDL principle and Bayes codes
M Goto ... S Hirasawa
IEEE Transactions on Information Theory | VOL. 47
M Goto, et. al.M Goto ... S Hirasawa
01 Mar 2001
IEEE Transactions on Information Theory | VOL. 47

Principle of representational minimum description length in image analysis and pattern recognition
A S Potapov
Pattern Recognition and Image Analysis | VOL. 22
A S PotapovA S Potapov
01 Mar 2012
Pattern Recognition and Image Analysis | VOL. 22

Enhanced minimum description length preprocessing of time series trajectories
Gajanan Gawde ... Jyoti Pawar
-
Gajanan Gawde, et. al.Gajanan Gawde ... Jyoti Pawar
01 Mar 2017
01 Mar 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Minimum description length revisited

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Mathematics for Industry