The top‐K tau‐path screen for monotone association in subpopulations

Srinath Sampath,Adriano Caloiaro,Wayne Johnson,Joseph S Verducci

doi:10.1002/wics.1382

Abstract

A pair of variables that tend to rise and fall either together or in opposition are said to be monotonically associated. For certain phenomena, this tendency is causally restricted to a subpopulation, as, e.g., the severity of an allergic reaction trending with the concentration of an air pollutant. Previously, Yu et al. (Stat Methodol 2011, 8:97–111) devised a method of rearranging observations to test paired data to see if such an association might be present in a subpopulation. However, the computational intensity of the method limited its application to relatively small samples of data, and the test itself only judges if association is present in some subpopulation; it does not clearly identify the subsample that came from this subpopulation, especially when the whole sample tests positive. The present study adds a ‘top‐K’ feature (Sampath S, Verducci JS. Stat Anal Data Min 2013, 6:458–471) based on a multistage ranking model, that identifies a concise subsample that is likely to contain a high proportion of observations from the subpopulation in which the association is supported. Computational improvements incorporated into this top‐K tau‐path algorithm now allow the method to be extended to thousands of pairs of variables measured on sample sizes in the thousands. A description of the new algorithm along with measures of computational complexity and practical efficiency help to gauge its potential use in different settings. Simulation studies catalog its accuracy in various settings, and an example from finance illustrates its step‐by‐step use. WIREs Comput Stat 2016, 8:206–218. doi: 10.1002/wics.1382This article is categorized under: Statistical Learning and Exploratory Methods of the Data Sciences > Exploratory Data Analysis Statistical and Graphical Methods of Data Analysis > Nonparametric Methods

Full Text