Abstract

Data integration plays a vital role in scientific research. In biomedical research, the OMICS fields have shown the need for larger datasets, like proteomics, pharmacogenomics, and newer fields like foodomics. As research projects require multiple data sources, mapping between these sources becomes necessary. Utilized workflow systems and integration tools therefore need to process large amounts of heterogeneous data formats, check for data source updates, and find suitable mapping methods to cross-reference entities from different databases. This article presents BioDWH2, an open-source, graph-based data warehouse and mapping tool, capable of helping researchers with these issues. A workspace centered approach allows project-specific data source selections and Neo4j or GraphQL server tools enable quick access to the database for analysis. The BioDWH2 tools are available to the scientific community at https://github.com/BioDWH2.

Highlights

  • Most studies in life science research require data to conduct different kinds of analyses

  • The OMICS fields have shown the need for larger datasets, like proteomics, pharmacogenomics, and newer fields like foodomics

  • Imker conducted a survey of published databases in the Nucleic Acids Research (NAR) database issues and concluded that as of 2018, 1700 databases were covered in 25 years [1]

Read more

Summary

Introduction

Most studies in life science research require data to conduct different kinds of analyses. The growing opportunities of molecular information in the clinical context [6, 7] necessitates the integration of more data sources of high quality from other OMICS fields. This includes finding meaningful relationships between drugs, diseases, and their molecular bases such as gene and protein variants, RNA regulation, and drug pathways. The integration and mapping of this information could provide an in-depth understanding of individual patient cases and reduce adverse drug reactions towards personalized medicine This growth in OMICS fields and data sources necessitates research projects to have a reliable and easy to use integration pipeline for data warehousing and information mapping. The goal is a simple setup and execution, and with as little custom configuration as possible

Related work
BioDWH2 workspace
Architecture
Program flow
Data source mapping
Data source implementations
Database access
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call