Abstract

Compared with sequence and structure similarity, functional similarity is more informative for understanding the biological roles and functions of genes. Many important applications in computational molecular biology require functional similarity, such as gene clustering, protein function prediction, protein interaction evaluation and disease gene prioritization. Gene Ontology (GO) is now widely used as the basis for measuring gene functional similarity. Some existing methods combined semantic similarity scores of single term pairs to estimate gene functional similarity, whereas others compared terms in groups to measure it. However, these methods may make error-prone judgments about gene functional similarity. It remains a challenge that measuring gene functional similarity reliably. We propose a novel method called SORA to measure gene functional similarity in GO context. First of all, SORA computes the information content (IC) of a term making use of semantic specificity and coverage. Second, SORA measures the IC of a term set by means of combining inherited and extended IC of the terms based on the structure of GO. Finally, SORA estimates gene functional similarity using the IC overlap ratio of term sets. SORA is evaluated against five state-of-the-art methods in the file on the public platform for collaborative evaluation of GO-based semantic similarity measure. The carefully comparisons show SORA is superior to other methods in general. Further analysis suggests that it primarily benefits from the structure of GO, which implies expressive information about gene function. SORA offers an effective and reliable way to compare gene function. The web service of SORA is freely available at http://nclab.hit.edu.cn/SORA/

Highlights

  • IntroductionGene functional similarity has become a main hotspot in biology research

  • In recent years, gene functional similarity has become a main hotspot in biology research

  • Terms in the Gene Ontology (GO) are classified as Electronic-assigned terms (E-terms) and Manually assigned terms (M-Terms)

Read more

Summary

Introduction

Gene functional similarity has become a main hotspot in biology research. Because it is important for a variety of applications such as gene clustering (Brameier and Wiuf, 2007; Cho et al, 2009; Qu and Xu, 2004; Yang et al, 2008), protein interaction prediction and evaluation (Li et al, 2008; Jain and Bader, 2010; Schlicker et al, 2007;), gene function prediction (Chen and Xu, 2004; Jensen et al, 2003; Nariai et al, 2007). Many methods based on semantic similarity have been put forward to estimate gene functional similarity. These methods could be generally classified into two categories: pairwise and group-wise (Pesquita et al, 2009a)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call