On Lower Bounds for Statistical Learning Theory

Po-Ling Loh

doi:10.3390/e19110617

Abstract

In recent years, tools from information theory have played an increasingly prevalent role in statistical machine learning. In addition to developing efficient, computationally feasible algorithms for analyzing complex datasets, it is of theoretical importance to determine whether such algorithms are “optimal” in the sense that no other algorithm can lead to smaller statistical error. This paper provides a survey of various techniques used to derive information-theoretic lower bounds for estimation and learning. We focus on the settings of parameter and function estimation, community recovery, and online learning for multi-armed bandits. A common theme is that lower bounds are established by relating the statistical learning problem to a channel decoding problem, for which lower bounds may be derived involving information-theoretic quantities such as the mutual information, total variation distance, and Kullback–Leibler divergence. We close by discussing the use of information-theoretic quantities to measure independence in machine learning applications ranging from causality to medical imaging, and mention techniques for estimating these quantities efficiently in a data-driven manner.

Highlights

Statistical learning theory refers to the rigorous mathematical analysis of machine learning algorithms [1,2]
A general approach is to relate the machine learning task at hand to an appropriate channel decoding problem, where the output corresponds to the observed data and the input corresponds to a cleverly constructed subset of the parameter space
This is a radically different goal from bounding estimation error, the techniques used to obtain lower bounds for multi-armed bandits include components of reductions to channel decoding problems: The key is to relate the performance of a learning algorithm to a problem of distinguishing between pairs of parameter assignments corresponding to underlying reward distributions that are close in parameter space

Summary

Introduction

Statistical learning theory refers to the rigorous mathematical analysis of machine learning algorithms [1,2]. The hardness of the decoding problem may in turn be quantified using techniques in information theory [3], leading to a lower bound on the estimation error This strategy has been applied successfully to a diverse array of statistical estimation problems, including parametric and nonparametric regression, structure estimation for graphical models, covariance matrix estimation, and dimension reduction methods such as principal component analysis [4,5,6,7,8,9]. This is a radically different goal from bounding estimation error, the techniques used to obtain lower bounds for multi-armed bandits include components of reductions to channel decoding problems: The key is to relate the performance of a learning algorithm to a problem of distinguishing between pairs of parameter assignments corresponding to underlying reward distributions that are close in parameter space. We have intentionally selected a diverse variety of problem settings in order to help the reader compare and contrast different approaches for obtaining lower bounds and identify the common threads underlying all the strategies

Statistical Estimation

Fano’s Method

Local Packings

Metric Entropy

Community Recovery

Weak Recovery

Exact Recovery

Online Learning

Stochastic Bandits

Adversarial Bandits

Discussion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Entropy	Publication Date: Nov 15, 2017
Citations: 9	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

On Lower Bounds for Statistical Learning Theory

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy

Lead the way for us

Similar Papers

Assessing reproducibility and veracity across machine learning techniques in biomedicine: A case study using TCGA data
Ahyoung Amy Kim ... Vignesh Subbian
International journal of bio-medical computing | VOL. 141
Ahyoung Amy Kim, et. al.Ahyoung Amy Kim ... Vignesh Subbian
13 May 2020
International journal of bio-medical computing | VOL. 141

On the Calculation of Mutual Information
Tyrone E Duncan
SIAM Journal on Applied Mathematics | VOL. 19
Tyrone E DuncanTyrone E Duncan
01 Jul 1970
SIAM Journal on Applied Mathematics | VOL. 19

A PAC Approach to Application-Specific Algorithm Selection
Rishi Gupta ... Tim Roughgarden
SIAM Journal on Computing | VOL. 46
Rishi Gupta, et. al.Rishi Gupta ... Tim Roughgarden
01 Jan 2017
SIAM Journal on Computing | VOL. 46

A PAC Approach to Application-Specific Algorithm Selection
Rishi Gupta ... Tim Roughgarden
-
Rishi Gupta, et. al.Rishi Gupta ... Tim Roughgarden
14 Jan 2016
14 Jan 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On Lower Bounds for Statistical Learning Theory

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy