Abstract
SummaryFor large-scale testing with graph-associated data, we present an empirical Bayes mixture technique to score local false-discovery rates (FDRs). Compared to procedures that ignore the graph, the proposed Graph-based Mixture Model (GraphMM) method gains power in settings where non-null cases form connected subgraphs, and it does so by regularizing parameter contrasts between testing units. Simulations show that GraphMM controls the FDR in a variety of settings, though it may lose control with excessive regularization. On magnetic resonance imaging data from a study of brain changes associated with the onset of Alzheimer’s disease, GraphMM produces greater yield than conventional large-scale testing procedures.
Highlights
Empirical Bayesian methods provide a useful approach to large-scale hypothesis testing in genomics, brain-imaging, and other application areas
We investigate its properties using a variety of synthetic-data scenarios, and we apply it to identify statistically significant changes in brain structure associated with the onset of mild cognitive impairment
We consider structural brain imaging data from a group of MX = 123 normal control subjects and a second group of MY = 148 subjects suffering from late-stage mild cognitive impairment (MCI), a precursor to Alzheimer’s disease (AD), and for the simulation we focus on a single coronal slice containing N = 5236 voxels
Summary
Empirical Bayesian methods provide a useful approach to large-scale hypothesis testing in genomics, brain-imaging, and other application areas. The analyst performs univariate testing en masse, with the final unit-specific scores and discoveries dependent upon the chosen empirical Bayesian method, which accounts for the collective properties of the separate statistics to gain an advantage (e.g., Storey (2003), Efron (2010), Stephens (2017)). These methods are effective but may be underpowered in some applied problems when the underlying effects are relatively weak. We conjecture that power is gained for graph-associated data by moving upstream in the data reduction process and by recognizing low complexity parameter states
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have