Abstract

BackgroundModel repositories such as BioModels Database provide computational models of biological systems for the scientific community. These models contain rich semantic annotations that link model entities to concepts in well-established bio-ontologies such as Gene Ontology. Consequently, thematically similar models are likely to share similar annotations. Based on this assumption, we argue that semantic annotations are a suitable tool to characterize sets of models. These characteristics improve model classification, allow to identify additional features for model retrieval tasks, and enable the comparison of sets of models.ResultsIn this paper we discuss four methods for annotation-based feature extraction from model sets. We tested all methods on sets of models in SBML format which were composed from BioModels Database. To characterize each of these sets, we analyzed and extracted concepts from three frequently used ontologies, namely Gene Ontology, ChEBI and SBO. We find that three out of the methods are suitable to determine characteristic features for arbitrary sets of models: The selected features vary depending on the underlying model set, and they are also specific to the chosen model set. We show that the identified features map on concepts that are higher up in the hierarchy of the ontologies than the concepts used for model annotations. Our analysis also reveals that the information content of concepts in ontologies and their usage for model annotation do not correlate.ConclusionsAnnotation-based feature extraction enables the comparison of model sets, as opposed to existing methods for model-to-keyword comparison, or model-to-model comparison.Electronic supplementary materialThe online version of this article (doi:10.1186/s13326-015-0014-4) contains supplementary material, which is available to authorized users.

Highlights

  • Thanks to standardization efforts in Systems Biology [1], modelers today have access to high-quality, curated models in standard formats

  • We considered existing approaches for feature extraction in other areas, such as text classification, and selected the document frequency to be to some extend applicable in extracting a pre-defined number of features from sets of Systems Biology Markup Language (SBML) models

  • The following subsections explain our four methods for feature identification, based on the aforementioned feature extraction methods (Section “Implemented feature extraction methods”); discuss their applicability to feature extraction from model sets (Section “Applicability of methods”); show the distribution of model annotations in BioModels Database (Section “Distribution of Systems Biology Ontology (SBO) concepts in SBML models”); and discuss the results obtained from two selected methods when applied to the abovementioned test sets (Section “Feature extraction from arbitrary model sets”)

Read more

Summary

Introduction

Thanks to standardization efforts in Systems Biology [1], modelers today have access to high-quality, curated models in standard formats. An alternative search is the ranked model retrieval [8] Models and their annotations are mapped on pre-defined model features It is possible to create a characteristic term Model repositories such as BioModels Database provide computational models of biological systems for the scientific community. These models contain rich semantic annotations that link model entities to concepts in well-established bio-ontologies such as Gene Ontology. We argue that semantic annotations are a suitable tool to characterize sets of models These characteristics improve model classification, allow to identify additional features for model retrieval tasks, and enable the comparison of sets of models. Each concept is linked to other kinds of information, including many gene and protein keyword databases

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call