Abstract

BackgroundWe present a biological data warehouse called Atlas that locally stores and integrates biological sequences, molecular interactions, homology information, functional annotations of genes, and biological ontologies. The goal of the system is to provide data, as well as a software infrastructure for bioinformatics research and development.DescriptionThe Atlas system is based on relational data models that we developed for each of the source data types. Data stored within these relational models are managed through Structured Query Language (SQL) calls that are implemented in a set of Application Programming Interfaces (APIs). The APIs include three languages: C++, Java, and Perl. The methods in these API libraries are used to construct a set of loader applications, which parse and load the source datasets into the Atlas database, and a set of toolbox applications which facilitate data retrieval. Atlas stores and integrates local instances of GenBank, RefSeq, UniProt, Human Protein Reference Database (HPRD), Biomolecular Interaction Network Database (BIND), Database of Interacting Proteins (DIP), Molecular Interactions Database (MINT), IntAct, NCBI Taxonomy, Gene Ontology (GO), Online Mendelian Inheritance in Man (OMIM), LocusLink, Entrez Gene and HomoloGene. The retrieval APIs and toolbox applications are critical components that offer end-users flexible, easy, integrated access to this data. We present use cases that use Atlas to integrate these sources for genome annotation, inference of molecular interactions across species, and gene-disease associations.ConclusionThe Atlas biological data warehouse serves as data infrastructure for bioinformatics research and development. It forms the backbone of the research activities in our laboratory and facilitates the integration of disparate, heterogeneous biological sources of data enabling new scientific inferences. Atlas achieves integration of diverse data sets at two levels. First, Atlas stores data of similar types using common data models, enforcing the relationships between data types. Second, integration is achieved through a combination of APIs, ontology, and tools. The Atlas software is freely available under the GNU General Public License at:

Highlights

  • We present a biological data warehouse called Atlas that locally stores and integrates biological sequences, molecular interactions, homology information, functional annotations of genes, and biological ontologies

  • The Atlas biological data warehouse serves as data infrastructure for bioinformatics research and development

  • Integration is achieved through a combination of Application Programming Interfaces (APIs), ontology, and tools

Read more

Summary

Introduction

We present a biological data warehouse called Atlas that locally stores and integrates biological sequences, molecular interactions, homology information, functional annotations of genes, and biological ontologies. Most public repositories of biological data focus on deriving and providing one particular type of data, be it biological sequences (e.g., GenBank [1], UniProt [2]), molecular interactions (The Biomolecular Interaction Network Database (BIND) [3,4,5], The Human Protein Reference Database (HPRD) [6]), or gene expression (The Stanford microarray database [7]) Integrating these disparate sources of data enables researchers to discover new associations between the data, or validate existing hypotheses. Using data from genomic sequences and annotations, mRNA expression, and subcellular localization, Mootha et al were able to use bioinformatics approaches to identify one of the disease genes responsible for Leigh syndrome [8] In another example of an integrative bioinformatics approach, Stuart et al used existing publicly available data to generate hypotheses about the functional roles of gene sets [9]. These two examples illustrate the potential of querying integrated public data to reveal novel relationships

Objectives
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call