Abstract

In a bivariate setting, we consider the problem of detecting a sparse contamination or mixture component, where the effect manifests itself as a positive dependence between the variables, which are otherwise independent in the main component. We first look at this problem in the context of a normal mixture model. In essence, the situation reduces to a univariate setting where the effect is a decrease in variance. In particular, a higher criticism test based on the pairwise differences is shown to achieve the detection boundary defined by the (oracle) likelihood ratio test. We then turn to a Gaussian copula model where the marginal distributions are unknown. Standard invariance considerations lead us to consider rank tests. In fact, a higher criticism test based on the pairwise rank differences achieves the detection boundary in the normal mixture model, although not in the very sparse regime. We do not know of any rank test that has any power in that regime.

Highlights

  • The detection of rare effects has been an important problem for years in settings, and may be relevant today, for example, with the search for personalized care in the health industry, where a small fraction of a population may respond well, or poorly, to some given treatment [20].Following a theoretical investigation initiated in large part by Ingster [16] and broadened by Donoho and Jin [10], we are interested in studying two-component mixture models, known as contamination models, in various asymptotic regimes defined by how the small mixture weight converges to zero

  • We find that the covariance test and the higher criticism test match the asymptotic performance of the likelihood ratio test to first-order, while the extremes test has no power

  • The power residing in the Vi In Proposition 2 we established that the higher criticism test based on U1, . . . , Un achieves the detection boundary in the Gaussian mixture model

Read more

Summary

Introduction

The detection of rare effects has been an important problem for years in settings, and may be relevant today, for example, with the search for personalized care in the health industry, where a small fraction of a population may respond well, or poorly, to some given treatment [20]. We are interested in bivariate data, instead, and in a situation where the effect felt in the dependence between the two variables being measured. This setting has been recently considered in the literature in the context of assessing the reproducibility of studies. [18] aims to identify significant features from separate studies using an expectationmaximization (EM) algorithm They applied a copula mixture model and assumed that changes in the mean and covariance matrix differentiate the contaminated component from the null component.

Gaussian mixture model
Gaussian mixture copula model
The likelihood ratio test
The covariance test
Numerical experiments
The covariance rank test
The higher criticism rank test
Discussion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call