Abstract

BackgroundHealth and disease of organisms are reflected in their phenotypes. Often, a genetic component to a disease is discovered only after clearly defining its phenotype. In the past years, many technologies to systematically generate phenotypes in a high-throughput manner, such as RNA interference or gene knock-out, have been developed and used to decipher functions for genes. However, there have been relatively few efforts to make use of phenotype data beyond the single genotype-phenotype relationships.ResultsWe present results on a study where we use a large set of phenotype data – in textual form – to predict gene annotation. To this end, we use text clustering to group genes based on their phenotype descriptions. We show that these clusters correlate well with several indicators for biological coherence in gene groups, such as functional annotations from the Gene Ontology (GO) and protein-protein interactions. We exploit these clusters for predicting gene function by carrying over annotations from well-annotated genes to other, less-characterized genes in the same cluster. For a subset of groups selected by applying objective criteria, we can predict GO-term annotations from the biological process sub-ontology with up to 72.6% precision and 16.7% recall, as evaluated by cross-validation. We manually verified some of these clusters and found them to exhibit high biological coherence, e.g. a group containing all available antennal Drosophila odorant receptors despite inconsistent GO-annotations.ConclusionThe intrinsic nature of phenotypes to visibly reflect genetic activity underlines their usefulness in inferring new gene functions. Thus, systematically analyzing these data on a large scale offers many possibilities for inferring functional annotation of genes. We show that text clustering can play an important role in this process.

Highlights

  • Health and disease of organisms are reflected in their phenotypes

  • Another issue is that until very recently, no comprehensive set of phenotypes with associated genes were available. This issue has been partly addressed by the creation of phenotype databases, such as PhenomicDB [9,10] or PhenoGO [11]

  • To validate the biological usefulness of the created 'phenoclusters', we examined the relatedness of the genes in a cluster using several independent measures, i.e., protein-proteininteraction (PPi) of associated proteins, functional annotations from the Gene Ontology (GO), and the co-occurrence of pairs of genes known to be responsible for identical phenotypes

Read more

Summary

Introduction

Health and disease of organisms are reflected in their phenotypes. Often, a genetic component to a disease is discovered only after clearly defining its phenotype. Due to the resulting heterogeneity in descriptions, automatically analyzing phenotypes is a daunting and yet relatively unexplored task Adding to this problem, the term 'phenotype' in itself is used for a broad variety of concepts, including the descriptions of clinical diseases, the characterization of naturally occurring mutants or experimentally generated mutants, and RNAi screens or gene knock-out experiments, and sometimes even large-scale microarray gene expression data, which makes an integrated analysis of phenotypes from different experiments and laboratories hard [8]. The term 'phenotype' in itself is used for a broad variety of concepts, including the descriptions of clinical diseases, the characterization of naturally occurring mutants or experimentally generated mutants, and RNAi screens or gene knock-out experiments, and sometimes even large-scale microarray gene expression data, which makes an integrated analysis of phenotypes from different experiments and laboratories hard [8] Another issue is that until very recently, no comprehensive set of phenotypes with associated genes were available. This issue has been partly addressed by the creation of phenotype databases, such as PhenomicDB [9,10] or PhenoGO [11] (see [8] for a survey on available phenotype data sets)

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call