A repository based on a dynamically extensible data model supporting multidisciplinary research in neuroscience

Luca Corradi,Gabriele Arnulfo,Andrea Schenone,Raffaele Ferrari,Parastoo Momeni,Marco M Fato,Ivan Porro,Michela Ferrara,Flavio Nobili

doi:10.1186/1472-6947-12-115

Abstract

BackgroundRobust, extensible and distributed databases integrating clinical, imaging and molecular data represent a substantial challenge for modern neuroscience. It is even more difficult to provide extensible software environments able to effectively target the rapidly changing data requirements and structures of research experiments. There is an increasing request from the neuroscience community for software tools addressing technical challenges about: (i) supporting researchers in the medical field to carry out data analysis using integrated bioinformatics services and tools; (ii) handling multimodal/multiscale data and metadata, enabling the injection of several different data types according to structured schemas; (iii) providing high extensibility, in order to address different requirements deriving from a large variety of applications simply through a user runtime configuration.MethodsA dynamically extensible data structure supporting collaborative multidisciplinary research projects in neuroscience has been defined and implemented. We have considered extensibility issues from two different points of view. First, the improvement of data flexibility has been taken into account. This has been done through the development of a methodology for the dynamic creation and use of data types and related metadata, based on the definition of “meta” data model. This way, users are not constrainted to a set of predefined data and the model can be easily extensible and applicable to different contexts. Second, users have been enabled to easily customize and extend the experimental procedures in order to track each step of acquisition or analysis. This has been achieved through a process-event data structure, a multipurpose taxonomic schema composed by two generic main objects: events and processes. Then, a repository has been built based on such data model and structure, and deployed on distributed resources thanks to a Grid-based approach. Finally, data integration aspects have been addressed by providing the repository application with an efficient dynamic interface designed to enable the user to both easily query the data depending on defined datatypes and view all the data of every patient in an integrated and simple way.ResultsThe results of our work have been twofold. First, a dynamically extensible data model has been implemented and tested based on a “meta” data-model enabling users to define their own data types independently from the application context. This data model has allowed users to dynamically include additional data types without the need of rebuilding the underlying database. Then a complex process-event data structure has been built, based on this data model, describing patient-centered diagnostic processes and merging information from data and metadata. Second, a repository implementing such a data structure has been deployed on a distributed Data Grid in order to provide scalability both in terms of data input and data storage and to exploit distributed data and computational approaches in order to share resources more efficiently. Moreover, data managing has been made possible through a friendly web interface. The driving principle of not being forced to preconfigured data types has been satisfied. It is up to users to dynamically configure the data model for the given experiment or data acquisition program, thus making it potentially suitable for customized applications.ConclusionsBased on such repository, data managing has been made possible through a friendly web interface. The driving principle of not being forced to preconfigured data types has been satisfied. It is up to users to dynamically configure the data model for the given experiment or data acquisition program, thus making it potentially suitable for customized applications.

Highlights

Robust, extensible and distributed databases integrating clinical, imaging and molecular data represent a substantial challenge for modern neuroscience
It is up to users to dynamically configure the data model for the given experiment or data acquisition program, making it potentially suitable for customized applications. Based on such repository, data managing has been made possible through a friendly web interface
Even though the architecture has been explicitely thought for facing extensibility issues, in terms of data model and in terms of larger datasets and more distributed hardware infrastructure, the scalability of our approach has not been tested yet, due to the limited dimension of the experiment

Summary

Introduction

Extensible and distributed databases integrating clinical, imaging and molecular data represent a substantial challenge for modern neuroscience. According to the International Neuroinformatics Coordination Facility (INCF), neuroinformatics is the research field that encompasses the organization of neuroscience data and the application in neuroscience of computational models and analytical tools [1]. In this regard, our approach has been intended to provide a flexible and extensible data model and its implementation in a software environment has been aimed at supporting clinicians and researchers in managing multidisciplinary neuroinformatics projects

Objectives

Methods

Results

Discussion

Conclusion