IntelliGO: a new vector-based semantic similarity measure including annotation origin.

Sidahmed Benabderrahmane,Amedeo Napoli,Olivier Poch,Malika Smail-Tabbone,Marie-Dominique Devignes

doi:10.1186/1471-2105-11-588

Abstract

BackgroundThe Gene Ontology (GO) is a well known controlled vocabulary describing the biological process, molecular function and cellular component aspects of gene annotation. It has become a widely used knowledge source in bioinformatics for annotating genes and measuring their semantic similarity. These measures generally involve the GO graph structure, the information content of GO aspects, or a combination of both. However, only a few of the semantic similarity measures described so far can handle GO annotations differently according to their origin (i.e. their evidence codes).ResultsWe present here a new semantic similarity measure called IntelliGO which integrates several complementary properties in a novel vector space model. The coefficients associated with each GO term that annotates a given gene or protein include its information content as well as a customized value for each type of GO evidence code. The generalized cosine similarity measure, used for calculating the dot product between two vectors, has been rigorously adapted to the context of the GO graph. The IntelliGO similarity measure is tested on two benchmark datasets consisting of KEGG pathways and Pfam domains grouped as clans, considering the GO biological process and molecular function terms, respectively, for a total of 683 yeast and human genes and involving more than 67,900 pair-wise comparisons. The ability of the IntelliGO similarity measure to express the biological cohesion of sets of genes compares favourably to four existing similarity measures. For inter-set comparison, it consistently discriminates between distinct sets of genes. Furthermore, the IntelliGO similarity measure allows the influence of weights assigned to evidence codes to be checked. Finally, the results obtained with a complementary reference technique give intermediate but correct correlation values with the sequence similarity, Pfam, and Enzyme classifications when compared to previously published measures.ConclusionsThe IntelliGO similarity measure provides a customizable and comprehensive method for quantifying gene similarity based on GO annotations. It also displays a robust set-discriminating power which suggests it will be useful for functional clustering.AvailabilityAn on-line version of the IntelliGO similarity measure is available at: http://bioinfo.loria.fr/Members/benabdsi/intelligo_project/

Highlights

The Gene Ontology (GO) is a well known controlled vocabulary describing the biological process, molecular function and cellular component aspects of gene annotation
The coefficients assigned to each vector component (GO term) are composed of two measures analogous to the tf-idf measures used for document retrieval [43]
It can be verified that GO terms which are frequently used to annotate genes in a corpus will display a low Inverse Annotation Frequency (IAF) value, whereas GO terms that are rarely used will display a high IAF which reflects their specificity and their potentially high contribution to vector comparison

Summary

Introduction

The Gene Ontology (GO) is a well known controlled vocabulary describing the biological process, molecular function and cellular component aspects of gene annotation. It has become a widely used knowledge source in bioinformatics for annotating genes and measuring their semantic similarity. 1.1 Gene annotation The Gene Ontology (GO) has become one of the most important and useful resources in bioinformatics [1] This ontology of about 30,000 terms is organized as a controlled vocabulary describing the biological process (BP), molecular function (MF), and cellular component (CC) aspects of gene annotation, called GO aspects [2]. Each rDAG has a unique root node, relationships between nodes are oriented, and there are no cycles, i.e. no path starts and ends at the same node

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC bioinformatics	Publication Date: Dec 1, 2010
Citations: 137	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

IntelliGO: a new vector-based semantic similarity measure including annotation origin.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics

Lead the way for us

Similar Papers

An improved method for scoring protein-protein interactions using semantic similarity within the gene ontology
Shobhit Jain ... Gary D Bader
BMC bioinformatics | VOL. 11
Shobhit Jain, et. al.Shobhit Jain ... Gary D Bader
15 Nov 2010
BMC bioinformatics | VOL. 11

GO2Vec: transforming GO terms and proteins to vector representations via graph embeddings
Xiaoshi Zhong ... Rama Kaalia
BMC Genomics | VOL. 20
Xiaoshi Zhong, et. al.Xiaoshi Zhong ... Rama Kaalia
01 Dec 2019
BMC Genomics | VOL. 20

A new hybrid semantic similarity measure using information content and topological features of the Gene Ontology graph
Pritha Dutta ... Mahantapas Kundu
-
Pritha Dutta, et. al.Pritha Dutta ... Mahantapas Kundu
01 Jan 2017
01 Jan 2017

Global analysis of gene function in yeast by quantitative phenotypic profiling
James A Brown ... Kelly E Mccann
Molecular systems biology | VOL. 2
James A Brown, et. al.James A Brown ... Kelly E Mccann
01 Jan 2006
Molecular systems biology | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

IntelliGO: a new vector-based semantic similarity measure including annotation origin.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics