An empirical threshold of selection probability for analysis of high-dimensional correlated data

Kipoong Kim,Jajoon Koo,Hokeun Sun

doi:10.1080/00949655.2020.1739286

Abstract

For the analysis of high-dimensional data, regularization methods based on penalized likelihood have been extensively studied over the last few decades. But, they commonly require the optimal choice of tuning parameters to select relevant variables. Although cross-validation has been popularly used for tuning parameter selection, its selection result is not often stable due to random split of samples. As an alternative to cross-validation, computation of selection probability has been proposed for stable variable selection. Ranking of individual variables can be determined based on their selection probability, regardless of tuning parameter values. However, a theoretical threshold of selection probability fails to control the number of false discoveries when it applies to high-dimensional correlated data. In this article, we propose new strategy to compute an empirical threshold of selection probability. Selection performance of the proposed threshold is evaluated through extensive simulation studies and high-dimensional genomic data analysis.

Full Text