Abstract
We put forward a variable selection method for selecting explanatory variables in a normality-assumed multivariate linear regression. It is cumbersome to calculate variable selection criteria for all subsets of explanatory variables when the number of explanatory variables is large. Therefore, we propose a fast and consistent variable selection method based on a generalized $C_{p}$ criterion. The consistency of the method is provided by a high-dimensional asymptotic framework such that the sample size and the sum of the dimensions of response vectors and explanatory vectors divided by the sample size tend to infinity and some positive constant which are less than one, respectively. Through numerical simulations, it is shown that the proposed method has a high probability of selecting the true subset of explanatory variables and is fast under a moderate sample size even when the number of dimensions is large.
Highlights
Multivariate linear regression is a widely known method of inferential analysis
Let Y be an n × p observation matrix of p response variables and X be an n×k observation matrix of k non-stochastic explanatory variables, where n is the sample size, and p and k are the numbers of response variables and explanatory variables, respectively
Suppose that j denotes a subset of the full set ω = {1, . . . , k} containing kj elements, and Xj denotes the n × kj matrix consisting of columns of X indexed by the elements of j, where kA denotes the number of elements in a set A, i.e., kA = #(A)
Summary
Multivariate linear regression is a widely known method of inferential analysis. It features in many theoretical and applied textbooks (see, e.g., [21, chap 9], [24, chap 4]) and it is used by researchers in many fields. It is expected that a consistent variable selection criterion has a highprobability of selecting the true subset j∗ because in general the probability of selecting the true subset is approximated by the asymptotic probability To this end, let LS, LR, LE and LTE be the large-sample (LS), large-response vector (LR), large-explanatory vector (LE) and large-true explanatory vector (LTE) asymptotic frameworks such that only n, p, k and k∗ tend to infinity, respectively. In the context for the consistency of variable selection criteria under the full search method, [4, 27, 28] used the following asymptotic frameworks as (p + k)/n → c ∈ [0, 1):.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.