A Tool for Interactive Data Visualization: Application to Over 10,000 Brain Imaging and Phantom MRI Data Sets.

Sandeep R Panta,Jill Fries,Tor D Wager,Michael Milham,Margaret King,Kent Kiehl,Marie Banich,Sergey M Plis,Vince D Calhoun,Jessica A Turner,Nicole Speer,Ravi Kalyanam,Runtang Wang

doi:10.3389/fninf.2016.00009

Abstract

In this paper we propose a web-based approach for quick visualization of big data from brain magnetic resonance imaging (MRI) scans using a combination of an automated image capture and processing system, nonlinear embedding, and interactive data visualization tools. We draw upon thousands of MRI scans captured via the COllaborative Imaging and Neuroinformatics Suite (COINS). We then interface the output of several analysis pipelines based on structural and functional data to a t-distributed stochastic neighbor embedding (t-SNE) algorithm which reduces the number of dimensions for each scan in the input data set to two dimensions while preserving the local structure of data sets. Finally, we interactively display the output of this approach via a web-page, based on data driven documents (D3) JavaScript library. Two distinct approaches were used to visualize the data. In the first approach, we computed multiple quality control (QC) values from pre-processed data, which were used as inputs to the t-SNE algorithm. This approach helps in assessing the quality of each data set relative to others. In the second case, computed variables of interest (e.g., brain volume or voxel values from segmented gray matter images) were used as inputs to the t-SNE algorithm. This approach helps in identifying interesting patterns in the data sets. We demonstrate these approaches using multiple examples from over 10,000 data sets including (1) quality control measures calculated from phantom data over time, (2) quality control data from human functional MRI data across various studies, scanners, sites, (3) volumetric and density measures from human structural MRI data across various studies, scanners and sites. Results from (1) and (2) show the potential of our approach to combine t-SNE data reduction with interactive color coding of variables of interest to quickly identify visually unique clusters of data (i.e., data sets with poor QC, clustering of data by site) quickly. Results from (3) demonstrate interesting patterns of gray matter and volume, and evaluate how they map onto variables including scanners, age, and gender. In sum, the proposed approach allows researchers to rapidly identify and extract meaningful information from big data sets. Such tools are becoming increasingly important as datasets grow larger.

Highlights

Visualizing high-dimensional data in a quick and simple way to produce meaningful information is a major challenge in the field of big data
Once these patterns are identified, a deeper investigation of the respective data sets could reveal more detailed useful information. Such an approach does not replace standard quality control (QC) approaches, but we contend that additional value is added by providing tools to provide users with a high-level view of their data. As we show, such a view can reveal information that is not detected with standard QC and provide a useful exploratory tool to interactively identify how variables of interest are encoded within the data or to assess how similar newly collected data are to existing data sets
We demonstrate our approaches with the following use cases: (1) quality control measures calculated from phantom data, (2) quality control metrics computed from human functional magnetic resonance imaging (MRI) data across various studies, scanners, sites, (3) volumetric measures from human structural MRI data across various studies, scanners and sites, (4) gray matter density values from all brain voxels

Summary

Introduction

Visualizing high-dimensional data in a quick and simple way to produce meaningful information is a major challenge in the field of big data. Perhaps most challenging, are data aggregation initiatives attempting to pool data across multiple imaging sites—especially when the data is heterogeneous (i.e., data collection protocols differ across sites; Potkin and Ford, 2009; Van Horn and Toga, 2009; Consortium, 2012; Di Martino et al, 2013). These challenges working with high dimensional data sets make simple and efficient quality control and information extraction from brain imaging data quite demanding

Objectives

Methods

Results

Discussion

Conclusion