Phenotype discovery from population brain imaging.

Weikang Gong,Christian F Beckmann,Stephen M Smith

doi:10.1016/j.media.2021.102050

Abstract

Neuroimaging allows for the non-invasive study of the brain in rich detail. Data-driven discovery of patterns of population variability in the brain has the potential to be extremely valuable for early disease diagnosis and understanding the brain. The resulting patterns can be used as imaging-derived phenotypes (IDPs), and may complement existing expert-curated IDPs. However, population datasets, comprising many different structural and functional imaging modalities from thousands of subjects, provide a computational challenge not previously addressed. Here, for the first time, a multimodal independent component analysis approach is presented that is scalable for data fusion of voxel-level neuroimaging data in the full UK Biobank (UKB) dataset, that will soon reach 100,000 imaged subjects. This new computational approach can estimate modes of population variability that enhance the ability to predict thousands of phenotypic and behavioural variables using data from UKB and the Human Connectome Project. A high-dimensional decomposition achieved improved predictive power compared with widely-used analysis strategies, single-modality decompositions and existing IDPs. In UKB data (14,503 subjects with 47 different data modalities), many interpretable associations with non-imaging phenotypes were identified, including multimodal spatial maps related to fluid intelligence, handedness and disease, in some cases where IDP-based approaches failed.

Highlights

Large-scale multimodal brain imaging has enormous potential for boosting epidemiological and neuroscientific studies, generating markers for early disease diagnosis and prediction of disease progression, and the understanding of human cognition, by means of linking to clinical or behavioural variables
When we correlated each of the BigFLICA modes and image-derived phenotypes (IDPs) with the fluid intelligence score in the UK Biobank (UKB), we found that several task-fMRI-related BigFLICA modes have the strongest associations (Fig. 5a)
We presented BigFLICA, a multimodal data fusion approach which is scalable and tuneable to analyze the full UK-Biobank neuroimaging dataset, and other large-scale multimodal imaging studies

Summary

Introduction

Large-scale multimodal brain imaging has enormous potential for boosting epidemiological and neuroscientific studies, generating markers for early disease diagnosis and prediction of disease progression, and the understanding of human cognition, by means of linking to clinical or behavioural variables. Large-scale neuroimaging studies first summarize the imaging data into interpretable image-derived phenotypes (IDPs)1, 5 , which are scalar quantities derived from raw imaging data (e.g., regional volumes from structural MRI, mean task activations from task MRI, resting-state functional connectivities between brain parcels) This knowledge-based approach is simple and efficient, and effectively reduces the high-dimensional data into interpretable, compact, convenient features. There may well be a large loss of information, due to such "expert-hand-designed" features not capturing important sources of subject variability (or even just losing sensitivity by the pre-defined spatial sub-areas being suboptimal), as well as ignoring cross-modality relationships Such uni-modal compartmentalised analyses do not utilise the fact that for many biological effects of interest we expect there to be biological convergence across different data modalities, iė,̇ changes in the underlying biological phenotype likely manifest themselves across multiple quantitative phenotypes, so that a joint analysis effectively increases both the power of detecting such effects and the interpretability of the findings

Objectives

Methods

Results

Conclusion