Abstract
Identification of influential observations is crucial in data analysis, particularly with high dimensional datasets, where the number of predictors is higher than the sample size. These rich datasets with extensive detail are increasingly exploited and analyzed in multiple fields of science, e.g., genomics, neuroscience, finance, etc. Unfortunately, classical diagnostic statistical tools are not tailored for identifying influential observations in high dimensional setup. In this paper, we use the concept of expectiles to develop an influence measure in high dimensional regression. The influence measure is based on the asymmetric marginal correlation, and its derived asymptotic distribution is used to define a threshold based on statistical principles. Our comprehensive simulation results display the favorable qualities of this influence measure under various scenarios. The usefulness of the proposed measure is illustrated through the analysis of a neuroimaging dataset. An R package implementing the procedure is publicly available on GitHub (https://github.com/AmBarry/hidetify).
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.