Abstract

Regression and feature selection require the minimisation of Xβ−y2 with respect to β, where X∈Rn×p, n<p in feature selection and n≥p in regression. The vector β contains the coefficients of the basis functions in regression, and the weights of the features in feature selection. This paper considers the stability of β, as measured by the ratio of its relative error with respect to the relative error in y, and it is shown that the condition number κ(X) of X is not a good measure of this stability. In particular, a large value of κ(X) may lead to incorrect conclusions about the stability of β and it may be thought regularisation must be applied to the normal equation XTXβ=XTy if κ(X)≫1, but its application may lead to a large error in β. It is shown in this paper that (a) the presence of noise in y or the condition κ(X)≫1 do not imply that regularisation must be applied to the normal equation, and (b) the condition κ(X)≫1 does not imply that a small relative error in y yields a large relative error in β. These disadvantages of κ(X) lead to the effective condition number η(X,y), which provides a better measure of the stability of β due to a perturbation in y, but it is difficult to compute it reliably in some circumstances. Regularisation requires that a constraint be imposed on the solution of the normal equation, and it is shown that a constraint on β1 can be interpreted in terms of the column sums of X, and that a constraint on β2 can be interpreted in terms of the singular value decomposition of X. The paper contains several examples that illustrate the theoretical results.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.