Sparsest Univariate Learning Models Under Lipschitz Constraint

Shayan Aziznejad,Thomas Debarre,Michael Unser

doi:10.1109/ojsp.2022.3157082

Shayan Aziznejad, Thomas Debarre + Show 1 more

Open Access

https://doi.org/10.1109/ojsp.2022.3157082

Copy DOI

Abstract

Beside the minimizationof the prediction error, two of the most desirable properties of a regression scheme are <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">stability</i> and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">interpretability</i> . Driven by these principles, we propose continuous-domain formulations for one-dimensional regression problems. In our first approach, we use the Lipschitz constant as a regularizer, which results in an implicit tuning of the overall robustness of the learned mapping. In our second approach, we control the Lipschitz constant explicitly using a user-defined upper-bound and make use of a sparsity-promoting regularizer to favor simpler (and, hence, more interpretable) solutions. The theoretical study of the latter formulation is motivated in part by its equivalence, which we prove, with the training of a Lipschitz-constrained two-layer univariate neural network with rectified linear unit (ReLU) activations and weight decay. By proving representer theorems, we show that both problems admit global minimizers that are continuous and piecewise-linear (CPWL) functions. Moreover, we propose efficient algorithms that find the sparsest solution of each problem: the CPWL mapping with the least number of linear regions. Finally, we illustrate numerically the outcome of our formulations.

Highlights

A prominent example is the family of reproducing-kernel Hilbert spaces (RKHS) F = H(Rd), X = Rd, Y = R [7], [8], in which the regression problem is formulated as
The interesting aspect of (5) is that the simplicity and stability of the learned mapping can be adjusted by tuning the parameters λ > 0 and L > 0, respectively. In this case as well, we prove a representer theorem which guarantees the existence of continuous and piecewiselinear (CPWL) solutions
The reconstruction is satisfactory in the active section (x > 1/2), it has many linear regions in the flat section (x < 1/2) that are not present in f0. This is due to the fact that the active section forces the Lipschitz constant of the reconstruction to be around 1, while oscillations with a slope smaller than 1 in the flat section are not penalized by the regularization. This problem clearly cannot be fixed by a simple increase in the regularization parameter: with λ = 0.2 (Figure 3b), there are still too many linear regions in the flat section, and the active section is poorly reconstructed because the Lipschitz constant is penalized too heavily by the regularization

Summary

INTRODUCTION

The goal of a regression model is to learn a mapping f : X → Y from a collection of data points (xm, ym) ∈ X × Y, m = 1, . M, such that ym ≈ f (xm), while avoiding the problem of overfitting [1], [2], [3]. A common way of carrying out this task is to solve a minimization problem of the form. F ∈F m=1 where F is the underlying search space, the convex loss function E : Y ×Y → R≥0 enforces the consistency of the learned mapping with the given data points, and the regularization functional R : F → R≥0 injects prior knowledge on the form of the mapping f , which is designed to alleviate the problem of overfitting

Nonparametric regression

Parametric regression

Our Contributions

Outline

Weak Derivatives

Banach Spaces

Lipschitz Constant

Second-Order Total-Variation To conclude this section, we introduce the space BV(2)(R)

LIPSCHITZ-AWARE FORMULATIONS FOR SUPERVISED LEARNING

Lipschitz Regularization

Lipschitz Constraint

FINDING THE SPARSEST CPWL SOLUTION

Experimental Setup

Example of Lipschitz Regularization

Limitations of Lipschitz-Only Regularization

Robustness to Outliers of the Lipschitz-Constrained Formulation

CONCLUSION

Proof of Theorem 2 and using (13) once again, we deduce that

Proof of Theorem 4

Proof of Theorem 3

Findings

Proof of Proposition 1

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Open Journal of Signal Processing	Publication Date: Jan 1, 2022
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Sparsest Univariate Learning Models Under Lipschitz Constraint

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Open Journal of Signal Processing

Lead the way for us

Similar Papers

Gradient Descent for Non-convex Problems in Modern Machine Learning

-

27 Jun 2019
27 Jun 2019

Optimizing Parameters of Artificial Intelligence Deep Convolutional Neural Networks (CNN) to improve Prediction Performance of Load Forecasting System
F M Butt ... L Hussain
IOP Conference Series: Earth and Environmental Science | VOL. 1026
F M Butt, et. al.F M Butt ... L Hussain
01 May 2022
IOP Conference Series: Earth and Environmental Science | VOL. 1026

Optimizing nonlinear activation function for convolutional neural networks
Munender Varshney ... Pravendra Singh
Signal, Image and Video Processing | VOL. 15
Munender Varshney, et. al.Munender Varshney ... Pravendra Singh
19 Feb 2021
Signal, Image and Video Processing | VOL. 15

A ReLU Dense Layer to Improve the Performance of Neural Networks
Alireza M Javid ... Sandipan Das
-
Alireza M Javid, et. al.Alireza M Javid ... Sandipan Das
06 Jun 2021
06 Jun 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sparsest Univariate Learning Models Under Lipschitz Constraint

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Open Journal of Signal Processing