Abstract 5282: A cloud-enabled open source data management platform supporting a federated research and development organization

Lauren Intagliata,Paul Ramirez,Garth Mcgrath,Nipurn Doshi,Giuseppe Totaro,Selina Chu,Maureen Cronin,Chris Mattmann,Shivika Thapar,Daniel Civello,Michael Livstone

doi:10.1158/1538-7445.am2016-5282

Abstract

Abstract Biopharmaceutical R&D organizations characterize drug candidate target effects and modes of action and create molecular models of target diseases. These data-intensive activities are informed by vast data resources including publicly available data, internally generated data and partnered private data collections. However, rapid evolution in computing, data management tools, analytical and visualization methods, the complexity of data types and the data volumes that must be accommodated present significant technical and logistic hurdles to overcome. It is particularly difficult for a geographically dispersed R&D organization to make data resources easily available to scientists for search, visualization and exploration. Nevertheless, this is required for R&D scientists to gain insight into disease and drug mechanisms and to capture the knowledge needed to sustain the scientific enterprise. Standardized commercial solutions to R&D data challenges are unattractive since they require significant resource investment in platform configuration, user-training and system maintenance. This strategy necessarily creates delay in adopting newly emerging technologies and provides incentive not to adopt alternatives due to investment in existing systems. In contrast, our solution to R&D data demands was to build a cloud-deployed data platform using state of the art tools developed and maintained by the open source software community at the Apache Software Foundation. Partnering with academic data scientists, we selected the best available tools to fit our specific needs. We integrated them into a platform accessible to our federated R&D scientific community while allowing the system to be freely modified and updated on demand to meet evolving user requirements. Priorities for our data platform are to ingest, secure and index R&D source data of all types, make these indexed data assets available to computational scientists for analysis and provide faceted search capability based on a comprehensive metadata model. Three products: LabKey server, Apache OODT and ISATools have all been combined into a scientific data management system to provide a unified data resource enhanced by a search platform powered by Apache Solr. The platform supports both internally generated data and data imported from public, contracted or partnered sources. All data are available for interactive exploration by our R&D community, accessed via integrated search, analysis and visualization tools. Deployment of this system to our R&D organization has been met with enthusiastic adoption. Feedback for improvement or requests for system enhancements and additional capabilities are rapidly addressed in this open source environment, leading to further adoption among the R&D scientists and providing the basis for accessible, stable institutional knowledge collections. Citation Format: Lauren Intagliata, Selina Chu, Garth McGrath, Giuseppe Totaro, Daniel Civello, Nipurn Doshi, Shivika Thapar, Michael Livstone, Chris Mattmann, Paul Ramirez, Maureen Cronin. A cloud-enabled open source data management platform supporting a federated research and development organization. [abstract]. In: Proceedings of the 107th Annual Meeting of the American Association for Cancer Research; 2016 Apr 16-20; New Orleans, LA. Philadelphia (PA): AACR; Cancer Res 2016;76(14 Suppl):Abstract nr 5282.

Full Text