Optimal errors and phase transitions in high-dimensional generalized linear models

Jean Barbier,Florent Krzakala,Nicolas Macris,Léo Miolane,Lenka Zdeborová

doi:10.1073/pnas.1802705116

Abstract

Generalized linear models (GLMs) are used in high-dimensional machine learning, statistics, communications, and signal processing. In this paper we analyze GLMs when the data matrix is random, as relevant in problems such as compressed sensing, error-correcting codes, or benchmark models in neural networks. We evaluate the mutual information (or "free entropy") from which we deduce the Bayes-optimal estimation and generalization errors. Our analysis applies to the high-dimensional limit where both the number of samples and the dimension are large and their ratio is fixed. Nonrigorous predictions for the optimal errors existed for special cases of GLMs, e.g., for the perceptron, in the field of statistical physics based on the so-called replica method. Our present paper rigorously establishes those decades-old conjectures and brings forward their algorithmic interpretation in terms of performance of the generalized approximate message-passing algorithm. Furthermore, we tightly characterize, for many learning problems, regions of parameters for which this algorithm achieves the optimal performance and locate the associated sharp phase transitions separating learnable and nonlearnable regions. We believe that this random version of GLMs can serve as a challenging benchmark for multipurpose algorithms.

Highlights

Generalized linear models (GLMs) are used in high-dimensional machine learning, statistics, communications, and signal processing
Nonrigorous predictions for the optimal errors existed for special cases of GLMs, e.g., for the perceptron, in the field of statistical physics based on the socalled replica method
A s datasets grow larger and more complex, modern data analysis requires solving high-dimensional estimation problems with very many parameters. Developing algorithms for this task and understanding their limitations have become a major challenge in computer science, machine learning, statistics, signal processing, communications, and related fields. We address this challenge in the case of generalized linear estimation models (GLMs) (1, 2) where data are generated as follows: Given an n-dimensional vector X∗, hidden to statisticians, they observe instead an m-dimensional vector Y where each component reads

Summary

Main Results

For the random GLM problem as defined in the Introduction, the optimal way to estimate the ground-truth signal/weights X∗ relies on its posterior probability distribution n m

Pout that an output

Application to Learning and Inference

Methods and Proofs

Define fn

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Proceedings of the National Academy of Sciences of the United States of America	Publication Date: Mar 1, 2019
Citations: 152	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Optimal errors and phase transitions in high-dimensional generalized linear models

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proceedings of the National Academy of Sciences of the United States of America

Lead the way for us

Similar Papers

Quantitative prediction of radiographic progression in patients with axial spondyloarthritis using neural network model in a real-world setting
In-Woon Baek ... Seung Min Jung
Arthritis research & therapy | VOL. 25
In-Woon Baek, et. al.In-Woon Baek ... Seung Min Jung
20 Apr 2023
Arthritis research & therapy | VOL. 25

Predicting motor vehicle collisions using Bayesian neural network models: An empirical analysis
Yuanchang Xie ... Yunlong Zhang
Accident Analysis & Prevention | VOL. 39
Yuanchang Xie, et. al.Yuanchang Xie ... Yunlong Zhang
16 Feb 2007
Accident Analysis & Prevention | VOL. 39

Using Neural Networks to Uncover the Relationship between Highly Variable Behavior and EEG during a Working Memory Task with Distractors
Christine Beauchene ... Thomas Hinault
Mathematics | VOL. 10
Christine Beauchene, et. al.Christine Beauchene ... Thomas Hinault
27 May 2022
Mathematics | VOL. 10

Machine Intelligence and Signal Processing
Richa Singh ... Ajay Kumar
-
Richa Singh, et. al.Richa Singh ... Ajay Kumar
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Optimal errors and phase transitions in high-dimensional generalized linear models

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proceedings of the National Academy of Sciences of the United States of America