The Cell Cycle Ontology: an application ontology supporting the study of cell cycle control

Erick Antezana,Mikel Egana,Robert Stevens,Bernard De Baets,Martin Kuiper,Vladimir Mironov,Ward Blondé

doi:10.1038/npre.2009.3563.1

Abstract

AbstractThe Cell Cycle Ontology (CCO) is an application ontology that automatically captures and integrates detailed knowledge on the cell cycle process by combining, interlinking and enriching knowledge from various sources. CCO uses Semantic Web technologies, and it is accessible via the web for browsing, visualising, advanced querying, and computational reasoning. CCO facilitates a detailed analysis of cell cycle related molecular network components. Through querying and automated reasoning, it may provide new hypotheses to help steer a systems biology approach to biological network building. The ontology is available on "http://www.cellcycleontology.org":http://www.cellcycleontology.org. Visual exploration can be done via the BioPortal, the Ontology Lookup Service, the Ontology Online service, or the DIAMONDS platform.*The Cell Cycle Ontology*The Cell Cycle Ontology captures detailed information (in terms and relationships) of the cell cycle process by combining representations from several, public sources.1 CCO supports four model organisms (H. sapiens, A. thaliana, S. pombe and S. cerevisiae) with separate ontologies and one integrated ontology. It is an application ontology that is supplied as an integrated turnkey system for exploratory analysis, advanced querying, and automated reasoning.CCO holds more than 13,000 concepts and 30 types of relationships. CCO comprises data from existing resources such as the Gene Ontology (GO), the Relations Ontology (RO), the IntAct database (MI), the NCBI taxonomy, the UniProt Knowledge Base as well as orthology data. An automatic pipeline builds CCO from scratch periodically: initially some existing ontologies (GO, RO, MI, in-house ones) are automatically fetched, integrated and merged, producing a core cell cycle ontology. Then, organism-specific protein and gene data are added from UniProt and from the GO Annotation files, generating four organism-specific ontologies. Those four ontologies are merged and more terms are included from an ontology built automatically from the OrthoMCL execution on the cell cycle proteins.*Formats and queries*CCO is built in the OBOF format with ONTO-PERL and exported to other formats later.2 CCO is available in: OBOF, RDF, XML, OWL, GML, and DOT. The Semantic Web formats RDF and OWL allow queries on CCO. In a SPARQL endpoint complex queries on the RDF format can be formulated, such as “retrieve all the core cell cycle proteins in S. cerevisiae that are located in the cytoplasm and that have a hydrolysisrelated function”.Relational closures are pre-inferenced in the RDF triple store, by operating SPARUL update queries over CCO and Metarel. This allows for very simple and responsive queries over long chains of relations in CCO.Finally, during the maintenance phase, a semantic improvement on the OWL version is carried out: Ontology Design Patterns are included using the Ontology Pre-Processor Language. The resulting CCO is designed to provide a richer view of the cell cycle regulatory process, in particular by accommodating the intrinsic dynamics of this process.*References*1. Antezana E, Egaña M, Blondé W et al. The Cell Cycle Ontology: An application ontology for the representation and integrated analysis of the cell cycle process, Genome Biology, 2009, 10:52. Antezana E, Egaña M, De Baets B, Kuiper M, Mironov V. ONTO-PERL: an api supporting the development and analysis of bio-ontologies. Bioinformatics, 2008, pp. 885–887.

Highlights

Endoreduplication got this novel gene X, does it model consistent interact with cell division with the current cycle 2 kinase?
Within the EU FP6 project DIAMONDS (LSHG-CT-2004512143) one of the objectives was to build a data integration platform dedicated to cell cycle biology
The Cell Cycle Ontology was chosen as data integration paradigm

Summary

Objective

To capture the knowledge about the cell cycle process ( its dynamic facets) and to promote sharing, reuse and enable better computational integration with existing resources (semantic web). The ultimate aim is to support evaluation and generation of hypotheses via reasoning services about cell-cycle regulation. Target organisms: S. cerevisiae, S. pombe, A. thaliana and H. sapiens. «Cyclin B (w1) is located in Cytoplasm (w2) during Interphase (w3) ». CCO should capture the semantics and spatio-temporal relationships (Fig 1) of cell-cycle components (proteins, genes, cellular locations, phases, ...). OBO and OWL-DL formats have been chosen for representing the knowledge. RACER is mainly used for checking the data consistency and for doing classifications

Data integration pipeline

Reasoning results

Conclusions and Results

Future work