Mapping neuroimaging resources into the NIDASH Data Model for federated information retrieval

Ghosh Satrajit,Haselgrove Christian,Keator David,Nichols B Nolan,Poline Jean-Baptiste,Steffener Jason,Stoner Richard

doi:10.3389/conf.fninf.2013.09.00055

Abstract

Event Abstract Back to Event Mapping neuroimaging resources into the NIDASH Data Model for federated information retrieval B. Nolan Nichols1*, Jason Steffener2, Christian Haselgrove3, David B. Keator4, Richard M. Stoner5, Jean-Baptiste Poline6 and Satrajit S. Ghosh7 1 University of Washington, United States 2 Columbia University, United States 3 University of Massachusetts Medical School, United States 4 University of California, Irvine, United States 5 University of California, San Diego, United States 6 University of California, Berkeley, United States 7 Massachusetts Institute of Technology, United States Introduction The astounding influx of human brain imaging data makes data annotation and sharing an essential aspect of modern neuroimaging research. However, no neuroimaging data exchange standard exists that makes consuming and publishing shared neuroimaging data simple and meaningful to researchers. In this work, we use the NIDASH Data Model (NI-DM; [1]), a neuroimaging domain specific extension to the W3C PROV Data Model [2], to create NI-DM Object Models that represent neuroimaging resources from the general context of provenance information. NI-DM is a key component of an effort to build a larger Semantic Web and Linked Data framework for the generation, storage and query of persistent brain imaging data (and associated metadata) in the context of existing ontologies. Methods We developed NI-DM Object Models to integrate three common brain imaging data modeling patterns: 1) database schemas, 2) standard directory structures, and 3) csv/text files (Figure 1). The ADHD200 (973 participants) dataset was downloaded from the NITRC Image Repository [3], an XNAT database [4]. The T1 weighted anatomical scans for each participant were processed using the ‘recon-all’ tool from FreeSurfer (FS) Version 5.1 [5], and additional phenotypic data was downloaded as a CSV file from NITRC. A NI-DM Object Model was then constructed for each information type. Results Three deliverables resulted from this effort. First, we defined NI-DM Object Models that represent information derived from the XNAT database schema, the FS standard subject directory structure, and the contents of FS statistics files (i.e., csv/text files). These Object Models were expressed in a set of IPython Notebooks to demonstrate the encoding process [6]. Second, the ADHD200 dataset was used to instantiate a Linked Data/RDF [7, 8] representation of the NI-DM Object types, each of which was uploaded into an RDF database (Figure 1). This representation is designed to capture data, associated metadata and provenance to allow for distributed storage and federated query. Third, we developed several queries in SPARQL [9], the query language for Linked Data, to evaluate the information retrieval capabilities of NI-DM. Two types of queries were successfully implemented, single data source and multi-data source federated queries [10]. Using these queries, we were able to successfully federate data sources and retrieve 1) participant demographics, 2) file resources and 3) anatomical statistics. Discussion The work presented here is being performed in the context of many other related efforts in defining a terminology for brain imaging and creating ontologies that capture relationships in these vocabularies. By leveraging RDF we broaden the range of biomedical information resources included in the Linked Data enterprise including existing services and libraries that can simplify query generation and speed-up response times. We believe this distributed model will show its usefulness before being fully adopted by the community. We have focused here on demonstrating the utility of NI-DM in the context of brain imaging, particularly in the representation of data processed by FS, but the benefits of the data model will grow as more brain imaging object models are designed for additional analysis packages (e.g., FSL, SPM) and derived datatypes. Figure 1 Acknowledgements This work was conducted with the Neuroimaging Task Force of the INCF Program on Standards for Data Sharing and the BIRN derived-data working group.

Full Text