Automatically transforming pre- to post-composed phenotypes: EQ-lising HPO and MP

Anika Oellrich,Dietrich Rebholz-Schuhmann,Christoph Grabmüller

doi:10.1186/2041-1480-4-29

Anika Oellrich, Dietrich Rebholz-Schuhmann + Show 1 more

Open Access

https://doi.org/10.1186/2041-1480-4-29

Copy DOI

Abstract

BackgroundLarge-scale mutagenesis projects are ongoing to improve our understanding about the pathology and subsequently the treatment of diseases. Such projects do not only record the genotype but also report phenotype descriptions of the genetically modified organisms under investigation. Thus far, phenotype data is stored in species-specific databases that lack coherence and interoperability in their phenotype representations. One suggestion to overcome the lack of integration are Entity-Quality (EQ) statements. However, a reliable automated transformation of the phenotype annotations from the databases into EQ statements is still missing.ResultsHere, we report on our ongoing efforts to develop a method (called EQ-liser) for the automated generation of EQ representations from phenotype ontology concept labels. We implemented the suggested method in a prototype and applied it to a subset of Mammalian and Human Phenotype Ontology concepts. In the case of MP, we were able to identify the correct EQ representation in over 52% of structure and process phenotypes. However, applying the EQ-liser prototype to the Human Phenotype Ontology yields a correct EQ representation in only 13.3% of the investigated cases.ConclusionsWith the application of the prototype to two phenotype ontologies, we were able to identify common patterns of mistakes when generating the EQ representation. Correcting these mistakes will pave the way to a species-independent solution to automatically derive EQ representations from phenotype ontology concept labels. Furthermore, we were able to identify inconsistencies in the existing manually defined EQ representations of current phenotype ontologies. Correcting these inconsistencies will improve the quality of the manually defined EQ statements.

Highlights

Large-scale mutagenesis projects are ongoing to improve our understanding about the pathology and subsequently the treatment of diseases
Phenotype descriptions from such mutagenesis experiments are kept in speciesspecific Model Organism Databases (MODs) to ensure that the representation of the phenotype data is wellstructured in support of further research in comparative phenomics [3]
The entities as well as the qualities have to be matched to ontological concepts that are provided from other Open biological and biomedical ontologies (OBO) Foundry ontologies

Summary

Introduction

Large-scale mutagenesis projects are ongoing to improve our understanding about the pathology and subsequently the treatment of diseases. Advances in sequencing technologies have opened up new ways for the systematic exploration of species-specific phenotypic traits linked to selected mutations of a given genome, for example the International Mouse Phenotyping Consortium (IMPC) analyses systematically the mouse genome to this end [1,2]. Phenotype descriptions from such mutagenesis experiments are kept in speciesspecific Model Organism Databases (MODs) to ensure that the representation of the phenotype data is wellstructured in support of further research in comparative phenomics [3]. These studies would certainly profit even more, if more data had been integrated into this framework

Objectives

Methods

Results

Conclusion