A Semantic Cross-Species Derived Data Management Application

David B Keator,Tallie Z Baram,Hal Stern,Fariba Fana,Jinran Chen,Steven L Small,Nolan Nichols

doi:10.5334/dsj-2017-045

David B Keator, Tallie Z Baram + Show 5 more

Open Access

https://doi.org/10.5334/dsj-2017-045

Copy DOI

Abstract

Managing dynamic information in large multi-site, multi-species, and multi-discipline consortia is a challenging task for data management applications. Often in academic research studies the goals for informatics teams are to build applications that provide extract-transform-load (ETL) functionality to archive and catalog source data that has been collected by the research teams. In consortia that cross species and methodological or scientific domains, building interfaces which supply data in a usable fashion and make intuitive sense to scientists from dramatically different backgrounds increases the complexity for developers. Further, reusing source data from outside one’s scientific domain is fraught with ambiguities in understanding the data types, analysis methodologies, and how to combine the data with those from other research teams. We report on the design, implementation, and performance of a semantic data management application to support the NIMH funded Conte Center at the University of California, Irvine. The Center is testing a theory of the consequences of “fragmented” (unpredictable, high entropy) early-life experiences on adolescent cognitive and emotional outcomes in both humans and rodents. It employs cross-species neuroimaging, epigenomic, molecular, and neuroanatomical approaches in humans and rodents to assess the potential consequences of fragmented unpredictable experience on brain structure and circuitry. To address this multi-technology, multi-species approach, the system uses semantic web techniques based on the Neuroimaging Data Model (NIDM) to facilitate data ETL functionality. We find this approach enables a low-cost, easy to maintain, and semantically meaningful information management system, enabling the diverse research teams to access and use the data.

Highlights

Managing dynamic information in large multi-site, multi-species, and multi-discipline consortia is a challenging task for data management applications
In designing the Conte Center information resource, the informatics group initially evaluated the capabilities of each core project team in terms of their readiness to capture and store data associated with their project, the appropriateness of the data formats they planned on using with respect to sharing data with the other projects and cores, and an evaluation of which specific aspects of their data would be used by other Center projects
We evaluated three publically available data management systems with the ability to capture many of the needed data types: the eXtensible Neuroimaging Archive Toolkit (XNAT) (Marcus et al, 2007), the Human Imaging Database (HID) (Ozyurt et al, 2010), and the Neuroinformatics Database (NiDB) (Book et al, 2013)