Abstract

BackgroundThe availability of various high-throughput experimental and computational methods allows biologists to rapidly infer functional relationships between genes. It is often necessary to evaluate these predictions computationally, a task that requires a reference database for functional relatedness. One such reference is the Gene Ontology (GO). A number of groups have suggested that the semantic similarity of the GO annotations of genes can serve as a proxy for functional relatedness. Here we evaluate a simple measure of semantic similarity, term overlap (TO).ResultsWe computed the TO for randomly selected gene pairs from the mouse genome. For comparison, we implemented six previously reported semantic similarity measures that share the feature of using computation of probabilities of terms to infer information content, in addition to three vector based approaches and a normalized version of the TO measure. We find that the overlap measure is highly correlated with the others but differs in detail. TO is at least as good a predictor of sequence similarity as the other measures. We further show that term overlap may avoid some problems that affect the probability-based measures. Term overlap is also much faster to compute than the information content-based measures.ConclusionOur experiments suggest that term overlap can serve as a simple and fast alternative to other approaches which use explicit information content estimation or require complex pre-calculations, while also avoiding problems that some other measures may encounter.

Highlights

  • The availability of various high-throughput experimental and computational methods allows biologists to rapidly infer functional relationships between genes

  • Information Content and Semantic Similarity Measures Several of the measures we considered require the computation of the information content of each Gene Ontology (GO) term

  • A variant method we considered is the normalized term overlap (NTO), in which the term overlap score is divided by the annotation set size for the gene with the lower number of GO annotations

Read more

Summary

Introduction

The availability of various high-throughput experimental and computational methods allows biologists to rapidly infer functional relationships between genes. It is often necessary to evaluate these predictions computationally, a task that requires a reference database for functional relatedness. One such reference is the Gene Ontology (GO). A number of groups have suggested that the semantic similarity of the GO annotations of genes can serve as a proxy for functional relatedness. We evaluate a simple measure of semantic similarity, term overlap (TO). Many genes have been functionally characterized by experimental methods, sequencing efforts, and highthroughput techniques, and as a consequence those genes appear in public databases annotated with terms or concepts representative of their deduced function or biological role in the cell. The current work involves an examination of the behaviour of various semantic similarity measures that have been proposed, including one that has not been previously considered in comparisons

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call