Abstract

The minimum description length (MDL) principle is extended to supervised learning. The MDL principle is a philosophy that the shortest description of given data leads to the best hypothesis about the data source. One of the key theories for the MDL principle is Barron and Cover’s theory (BC theory), which mathematically justifies the MDL principle based on two-stage codes in density estimation (unsupervised learning). Though the codelength of two-stage codes looks similar to the target function of penalized likelihood methods, parameter optimization of penalized likelihood methods is done without quantization of parameter space. Recently, Chatterjee and Barron have provided theoretical tools to extend BC theory to penalized likelihood methods by overcoming this difference. Indeed, applying their tools, they showed that the famous penalized likelihood method ‘lasso’ can be interpreted as an MDL estimator and enjoys performance guarantee by BC theory. An important fact is that their results assume a fixed design setting, which is essentially the same as unsupervised learning. The fixed design is natural if we use lasso for compressed sensing. If we use lasso for supervised learning, however, the fixed design is considerably unsatisfactory. Only random design is acceptable. However, it is inherently difficult to extend BC theory to the random design regardless of whether the parameter space is quantized or not. In this paper, a novel theoretical tool for extending BC theory to supervised learning (the random design setting and no quantization of parameter space) is provided. Applying this tool, when the covariates are subject to a Gaussian distribution, it is proved that lasso in the random design setting can also be interpreted as an MDL estimator, and that lasso enjoys the risk bound of BC theory. The risk/regret bounds obtained have several advantages inherited from BC theory. First, the bounds require remarkably few assumptions. Second, the bounds hold for any finite sample size $n$ and any finite feature number $p$ even if $n\ll p$ . Behavior of the regret bound is investigated by numerical simulations. We believe that this is the first extensions of BC theory to supervised learning (random design).

Highlights

  • T HE minimum description length (MDL) principle is a philosophy that the shortest description of given data leads to the best hypothesis about the data source

  • Statistical risk measures the discrepancy between the true distribution generating data and the probability distribution used to encode the data in the two-stage code

  • When the covariates are subject to a Gaussian distribution, it will be proved by this extension that lasso is a BC-proper MDL estimator even in the random design setting

Read more

Summary

INTRODUCTION

T HE minimum description length (MDL) principle is a philosophy that the shortest description of given data leads to the best hypothesis about the data source. Statistical risk measures the discrepancy between the true distribution generating data and the probability distribution used to encode the data in the two-stage code This inequality guarantees that finding a code that has small redundancy (description length) leads to a small risk bound. Chatterjee and Barron provided two convenient sufficient conditions for the above conditions, respectively, in case of penalized likelihood, which are named ‘codelength validity’ for Condition 1 and ‘risk validity’ for Condition 2 By these tools, they succeeded in showing that lasso [34] is a BC-proper MDL estimator under certain conditions. When the covariates are subject to a Gaussian distribution, it will be proved by this extension that lasso is a BC-proper MDL estimator even in the random design setting. We believe that our work is the first work that extends BC theory to penalized likelihood in the random design setting

Risk Bound for Lasso
Organization of the Paper
MDL ESTIMATORS IN SUPERVISED LEARNING
Barron and Cover’s Theory for Density Estimation
Barron and Cover’s Theory for Penalized Likelihood
Extension of BC Theory to Supervised Learning
Application to Lasso in Random Design Setting
NUMERICAL SIMULATIONS
Proof of Theorem 3
Pn 1 Pn 1 Pn
Proof of Theorem 4
Proof of Corollary 5
Rényi Divergence and Its Derivatives
Upper Bound of Negative Hessian
Proof of Lemma 3
Some Remarks on the Proof of Lemma 3
Proof of Lemma 4
Proof of Lemma 5
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call