Abstract

BackgroundAdvances in high-throughput technologies have enabled extensive generation of multi-level omics data. These data are crucial for systems biology research, though they are complex, heterogeneous, highly dynamic, incomplete and distributed among public databases. This leads to difficulties in data accessibility and often results in errors when data are merged and integrated from varied resources. Therefore, integration and management of systems biological data remain very challenging.MethodsTo overcome this, we designed and developed a dedicated database system that can serve and solve the vital issues in data management and hereby facilitate data integration, modeling and analysis in systems biology within a sole database. In addition, a yeast data repository was implemented as an integrated database environment which is operated by the database system. Two applications were implemented to demonstrate extensibility and utilization of the system. Both illustrate how the user can access the database via the web query function and implemented scripts. These scripts are specific for two sample cases: 1) Detecting the pheromone pathway in protein interaction networks; and 2) Finding metabolic reactions regulated by Snf1 kinase.Results and conclusionIn this study we present the design of database system which offers an extensible environment to efficiently capture the majority of biological entities and relations encountered in systems biology. Critical functions and control processes were designed and implemented to ensure consistent, efficient, secure and reliable transactions. The two sample cases on the yeast integrated data clearly demonstrate the value of a sole database environment for systems biology research.

Highlights

  • Advances in high-throughput technologies have enabled extensive generation of multi-level omics data

  • The explosive growth in biological and biochemical data is beneficial for systems biology research and it has driven the development of diverse types of biological databases, such as GenBank [2], UniProt [3], SGD [4], HMDB [5], BioGRID [6], KEGG [7], ArrayExpress [8] and GEO [9]

  • Attempts have been made to resolve these key issues through the development of numerous data standards (e.g. SBML [11], CellML [12], PSI-MI [13], Biological Pathway Exchange (BioPAX) [14], gene ontology terms (GO) [15] and SBO [16]), the implementation of centralized and federated databases and the proposal of design methodologies for software and databases (e.g. I-cubed [20] and [21])

Read more

Summary

Introduction

Advances in high-throughput technologies have enabled extensive generation of multi-level omics data These data are crucial for systems biology research, though they are complex, heterogeneous, highly dynamic, incomplete and distributed among public databases. We describe the development of a dedicated database system for handling multi-level data that represents an ongoing endeavor to serve researchers in systems biology and provide alternative solutions for vital issues in data handling, data access and integration of data in a single database. The design of database schema is adapted from BioPAX and implemented based on an object-oriented concept which represents practical information as an object with related attributes and a variety of relationships This concept is applicable for biological information, which is apparently heterogeneous and sophisticated [24]. The database API was developed in C++ and included a library providing important functions to manage and interact with the system

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call