A fast and consistent variable selection method for high-dimensional multivariate linear regression with a large number of explanatory variables

Ryoya Oda,Hirokazu Yanagihara

doi:10.1214/20-ejs1701

Abstract

We put forward a variable selection method for selecting explanatory variables in a normality-assumed multivariate linear regression. It is cumbersome to calculate variable selection criteria for all subsets of explanatory variables when the number of explanatory variables is large. Therefore, we propose a fast and consistent variable selection method based on a generalized $C_{p}$ criterion. The consistency of the method is provided by a high-dimensional asymptotic framework such that the sample size and the sum of the dimensions of response vectors and explanatory vectors divided by the sample size tend to infinity and some positive constant which are less than one, respectively. Through numerical simulations, it is shown that the proposed method has a high probability of selecting the true subset of explanatory variables and is fast under a moderate sample size even when the number of dimensions is large.

Highlights

Multivariate linear regression is a widely known method of inferential analysis
Let Y be an n × p observation matrix of p response variables and X be an n×k observation matrix of k non-stochastic explanatory variables, where n is the sample size, and p and k are the numbers of response variables and explanatory variables, respectively
Suppose that j denotes a subset of the full set ω = {1, . . . , k} containing kj elements, and Xj denotes the n × kj matrix consisting of columns of X indexed by the elements of j, where kA denotes the number of elements in a set A, i.e., kA = #(A)

Summary

Introduction

Multivariate linear regression is a widely known method of inferential analysis. It features in many theoretical and applied textbooks (see, e.g., [21, chap 9], [24, chap 4]) and it is used by researchers in many fields. It is expected that a consistent variable selection criterion has a highprobability of selecting the true subset j∗ because in general the probability of selecting the true subset is approximated by the asymptotic probability To this end, let LS, LR, LE and LTE be the large-sample (LS), large-response vector (LR), large-explanatory vector (LE) and large-true explanatory vector (LTE) asymptotic frameworks such that only n, p, k and k∗ tend to infinity, respectively. In the context for the consistency of variable selection criteria under the full search method, [4, 27, 28] used the following asymptotic frameworks as (p + k)/n → c ∈ [0, 1):.

Preliminaries

Proposed selection method

Consistency of proposed selection method

Extension of the ZKB selection method

Numerical studies

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Electronic Journal of Statistics	Publication Date: Jan 1, 2020
Citations: 7	License type: cc-by

R Discovery Prime

R Discovery Prime

A fast and consistent variable selection method for high-dimensional multivariate linear regression with a large number of explanatory variables

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronic Journal of Statistics

Lead the way for us

Similar Papers

Application of variable selection and dimension reduction on predictors of MSE\u2019s development
Habtamu Tilaye Wubetie
Journal of Big Data | VOL. 6
Habtamu Tilaye WubetieHabtamu Tilaye Wubetie
18 Feb 2019
Application of variable selection and dimension reduction on predictors of MSE\u2019s development
Habtamu Tilaye Wubetie

Examining variable selection methods for the predictive performance of regression models and the proportion of selected variables and selected random variables
Hiromasa Kaneko
Heliyon | VOL. 7
Hiromasa KanekoHiromasa Kaneko
01 Jun 2021
Heliyon | VOL. 7

Tuning Variable Selection Procedures by Adding Noise
Xiaohui Luo ... Dennis D Boos
Technometrics | VOL. 48
Xiaohui Luo, et. al.Xiaohui Luo ... Dennis D Boos
01 May 2006
Technometrics | VOL. 48

A Consistent Likelihood-Based Variable Selection Method in Normal Multivariate Linear Regression
Ryoya Oda ... Hirokazu Yanagihara
-
Ryoya Oda, et. al.Ryoya Oda ... Hirokazu Yanagihara
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A fast and consistent variable selection method for high-dimensional multivariate linear regression with a large number of explanatory variables

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronic Journal of Statistics