Abstract

We put forward a variable selection method for selecting explanatory variables in a normality-assumed multivariate linear regression. It is cumbersome to calculate variable selection criteria for all subsets of explanatory variables when the number of explanatory variables is large. Therefore, we propose a fast and consistent variable selection method based on a generalized $C_{p}$ criterion. The consistency of the method is provided by a high-dimensional asymptotic framework such that the sample size and the sum of the dimensions of response vectors and explanatory vectors divided by the sample size tend to infinity and some positive constant which are less than one, respectively. Through numerical simulations, it is shown that the proposed method has a high probability of selecting the true subset of explanatory variables and is fast under a moderate sample size even when the number of dimensions is large.

Highlights

  • Multivariate linear regression is a widely known method of inferential analysis

  • Let Y be an n × p observation matrix of p response variables and X be an n×k observation matrix of k non-stochastic explanatory variables, where n is the sample size, and p and k are the numbers of response variables and explanatory variables, respectively

  • Suppose that j denotes a subset of the full set ω = {1, . . . , k} containing kj elements, and Xj denotes the n × kj matrix consisting of columns of X indexed by the elements of j, where kA denotes the number of elements in a set A, i.e., kA = #(A)

Read more

Summary

Introduction

Multivariate linear regression is a widely known method of inferential analysis. It features in many theoretical and applied textbooks (see, e.g., [21, chap 9], [24, chap 4]) and it is used by researchers in many fields. It is expected that a consistent variable selection criterion has a highprobability of selecting the true subset j∗ because in general the probability of selecting the true subset is approximated by the asymptotic probability To this end, let LS, LR, LE and LTE be the large-sample (LS), large-response vector (LR), large-explanatory vector (LE) and large-true explanatory vector (LTE) asymptotic frameworks such that only n, p, k and k∗ tend to infinity, respectively. In the context for the consistency of variable selection criteria under the full search method, [4, 27, 28] used the following asymptotic frameworks as (p + k)/n → c ∈ [0, 1):.

Preliminaries
Proposed selection method
Consistency of proposed selection method
Extension of the ZKB selection method
Numerical studies

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.