Abstract

Identification of influential observations is crucial in data analysis, particularly with high dimensional datasets, where the number of predictors is higher than the sample size. These rich datasets with extensive detail are increasingly exploited and analyzed in multiple fields of science, e.g., genomics, neuroscience, finance, etc. Unfortunately, classical diagnostic statistical tools are not tailored for identifying influential observations in high dimensional setup. In this paper, we use the concept of expectiles to develop an influence measure in high dimensional regression. The influence measure is based on the asymmetric marginal correlation, and its derived asymptotic distribution is used to define a threshold based on statistical principles. Our comprehensive simulation results display the favorable qualities of this influence measure under various scenarios. The usefulness of the proposed measure is illustrated through the analysis of a neuroimaging dataset. An R package implementing the procedure is publicly available on GitHub (https://github.com/AmBarry/hidetify).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call