Abstract
Beside the minimizationof the prediction error, two of the most desirable properties of a regression scheme are <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">stability</i> and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">interpretability</i> . Driven by these principles, we propose continuous-domain formulations for one-dimensional regression problems. In our first approach, we use the Lipschitz constant as a regularizer, which results in an implicit tuning of the overall robustness of the learned mapping. In our second approach, we control the Lipschitz constant explicitly using a user-defined upper-bound and make use of a sparsity-promoting regularizer to favor simpler (and, hence, more interpretable) solutions. The theoretical study of the latter formulation is motivated in part by its equivalence, which we prove, with the training of a Lipschitz-constrained two-layer univariate neural network with rectified linear unit (ReLU) activations and weight decay. By proving representer theorems, we show that both problems admit global minimizers that are continuous and piecewise-linear (CPWL) functions. Moreover, we propose efficient algorithms that find the sparsest solution of each problem: the CPWL mapping with the least number of linear regions. Finally, we illustrate numerically the outcome of our formulations.
Highlights
A prominent example is the family of reproducing-kernel Hilbert spaces (RKHS) F = H(Rd), X = Rd, Y = R [7], [8], in which the regression problem is formulated as
The interesting aspect of (5) is that the simplicity and stability of the learned mapping can be adjusted by tuning the parameters λ > 0 and L > 0, respectively. In this case as well, we prove a representer theorem which guarantees the existence of continuous and piecewiselinear (CPWL) solutions
The reconstruction is satisfactory in the active section (x > 1/2), it has many linear regions in the flat section (x < 1/2) that are not present in f0. This is due to the fact that the active section forces the Lipschitz constant of the reconstruction to be around 1, while oscillations with a slope smaller than 1 in the flat section are not penalized by the regularization. This problem clearly cannot be fixed by a simple increase in the regularization parameter: with λ = 0.2 (Figure 3b), there are still too many linear regions in the flat section, and the active section is poorly reconstructed because the Lipschitz constant is penalized too heavily by the regularization
Summary
The goal of a regression model is to learn a mapping f : X → Y from a collection of data points (xm, ym) ∈ X × Y, m = 1, . M, such that ym ≈ f (xm), while avoiding the problem of overfitting [1], [2], [3]. A common way of carrying out this task is to solve a minimization problem of the form. F ∈F m=1 where F is the underlying search space, the convex loss function E : Y ×Y → R≥0 enforces the consistency of the learned mapping with the given data points, and the regularization functional R : F → R≥0 injects prior knowledge on the form of the mapping f , which is designed to alleviate the problem of overfitting
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.