Dimension constraints improve hypothesis testing for large-scale, graph-associated, brain-image data.

Tien Vo,Akshay Mishra,Vamsi Ithapu,Vikas Singh,Michael A Newton

doi:10.1093/biostatistics/kxab001

Tien Vo, Akshay Mishra + Show 3 more

Open Access

https://doi.org/10.1093/biostatistics/kxab001

Copy DOI

Journal: Biostatistics	Publication Date: Feb 22, 2021
Citations: 2	License type: CC BY 4.0

Affiliation: University of Wisconsin–Madison

Abstract

SummaryFor large-scale testing with graph-associated data, we present an empirical Bayes mixture technique to score local false-discovery rates (FDRs). Compared to procedures that ignore the graph, the proposed Graph-based Mixture Model (GraphMM) method gains power in settings where non-null cases form connected subgraphs, and it does so by regularizing parameter contrasts between testing units. Simulations show that GraphMM controls the FDR in a variety of settings, though it may lose control with excessive regularization. On magnetic resonance imaging data from a study of brain changes associated with the onset of Alzheimer’s disease, GraphMM produces greater yield than conventional large-scale testing procedures.

Highlights

Empirical Bayesian methods provide a useful approach to large-scale hypothesis testing in genomics, brain-imaging, and other application areas
We investigate its properties using a variety of synthetic-data scenarios, and we apply it to identify statistically significant changes in brain structure associated with the onset of mild cognitive impairment
We consider structural brain imaging data from a group of MX = 123 normal control subjects and a second group of MY = 148 subjects suffering from late-stage mild cognitive impairment (MCI), a precursor to Alzheimer’s disease (AD), and for the simulation we focus on a single coronal slice containing N = 5236 voxels

Summary

Introduction

Empirical Bayesian methods provide a useful approach to large-scale hypothesis testing in genomics, brain-imaging, and other application areas. The analyst performs univariate testing en masse, with the final unit-specific scores and discoveries dependent upon the chosen empirical Bayesian method, which accounts for the collective properties of the separate statistics to gain an advantage (e.g., Storey (2003), Efron (2010), Stephens (2017)). These methods are effective but may be underpowered in some applied problems when the underlying effects are relatively weak. We conjecture that power is gained for graph-associated data by moving upstream in the data reduction process and by recognizing low complexity parameter states

Objectives

Methods

Results

Discussion

Conclusion