Abstract

The computational problem of finding the best fitting subset of independent variables in least-squares regression with a fixed subset size is addressed, especially in the context of the nonfull-rank case with more variables than observations. For the full-rank case, the most efficient widely used methods work by finding the complementary subset with minimum to the total regression sum of squares; a task that can usually be accomplished with far less computation than exhaustive evaluation of all subsets. Here, a method using Cholesky-type factorizations (Algorithm 2) has been developed, which also takes advantage of the computational savings offered by the reduction approach, but which can be used in nonfull-rank cases where existing methods are not applicable. Algorithm 2 is derived by examining the asymptotic properties of a full-rank procedure (Algorithm 1) used on a perturbation of the cross-product matrix. In the course of testing, it was discovered that Algorithm 1, with the appropriate ridge parameter, usually selected the best subset with less computation than Algorithm 2; however, if one requires mathematical certitude, use of Algorithm 2 is indicated. Also, some new approaches are proposed for developing efficient methods of identifying the best subset directly, rather than by complement to the minimum-reduction subset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call