Abstract

With the high speed development of information technology, contemporary data from a variety of fields becomes extremely large. The number of features in many datasets is well above the sample size and is called high dimensional data. In statistics, variable selection approaches are required to extract the efficacious information from high dimensional data. The most popular approach is to add a penalty function coupled with a tuning parameter to the log likelihood function, which is called penalized likelihood method. However, almost all of penalized likelihood approaches only consider noise accumulation and supurious correlation whereas ignoring the endogeneity which also appeared frequently in high dimensional space. In this paper, we explore the cause of endogeneity and its influence on penalized likelihood approaches. Simulations based on five classical pe-nalized approaches are provided to vindicate their inconsistency under endogeneity. The results show that the positive selection rate of all five approaches increased gradually but the false selection rate does not consistently decrease when endogenous variables exist, that is, they do not satisfy the selection consistency.

Highlights

  • Along with the rapid progress of information technology and electronics industry, more and more data have been obtained from biomedical, econometrics and other fields

  • Variable selection approaches are required to extract the efficacious information from high dimensional data

  • The most popular approach is to add a penalty function coupled with a tuning parameter to the log likelihood function, which is called penalized likelihood method

Read more

Summary

Introduction

Along with the rapid progress of information technology and electronics industry, more and more data have been obtained from biomedical, econometrics and other fields. In order to extract valid information from mass data, high-dimensional variable selection has been set off in statistics. Compared with traditional data analysis, variable selection in high-dimensional space increases the computational burden, and leads to noise accumulation, spurious correlation and endogeneity [1]. When important variables are highly correlated with some redundant variables, these redundant variables are selected and make suspicious variables. In this case, penalty function is usually added. Most penalized likelihood methods consider noise accumulation and spurious correlation, but ignore another important factor—endogeneity [3]. This paper studies the influence of endogeneity on the classical penalized likelihood methods, which is divided into three parts. It introduces the origin and causes of endogeneity; secondly, it summarizes the classical penalized likelihood method and its development process; comparative analysis is carried out to show the inconsistency of various penalized likelihood approaches under endogeneity

The Origin and Cause of Endogeneity
Penalized Likelihood Method and Its Development
Lasso and Improvements
SCAD and Related
Tuning Parameter
Inconsistency under Endogeneity
Specification of Model
Results and Interpretation
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call