Abstract

The current increase in Gene Ontology (GO) annotations of proteins in the existing genome databases and their use in different analyses have fostered the improvement of several biomedical and biological applications. To integrate this functional data into different analyses, several protein functional similarity measures based on GO term information content (IC) have been proposed and evaluated, especially in the context of annotation-based measures. In the case of topology-based measures, each approach was set with a specific functional similarity measure depending on its conception and applications for which it was designed. However, it is not clear whether a specific functional similarity measure associated with a given approach is the most appropriate, given a biological data set or an application, i.e., achieving the best performance compared to other functional similarity measures for the biological application under consideration. We show that, in general, a specific functional similarity measure often used with a given term IC or term semantic similarity approach is not always the best for different biological data and applications. We have conducted a performance evaluation of a number of different functional similarity measures using different types of biological data in order to infer the best functional similarity measure for each different term IC and semantic similarity approach. The comparisons of different protein functional similarity measures should help researchers choose the most appropriate measure for the biological application under consideration.

Highlights

  • The advancement of high-throughput biology technologies has resulted in a large increase in functional data, eliciting the need for relevant tools that help analyze and extract information from these data

  • These measures are derived either directly from the Gene Ontology (GO) term information content (IC), a numerical value scoring the description and specificity of a GO term using its position in the GO directed acyclic graph (DAG), or from GO term semantic similarity scores conveying information shared by two GO terms in the GO DAG [8]

  • Each semantic similarity approach or functional measure was defined for a specific purpose with a specific application in mind, especially in the context of topologybased approaches, where each approach was set with its specific functional similarity measure, depending on its conception and the applications for which it was designed

Read more

Summary

Introduction

The advancement of high-throughput biology technologies has resulted in a large increase in functional data, eliciting the need for relevant tools that help analyze and extract information from these data. Several functional similarity measures that quantify similarity between proteins based on their GO annotations have been introduced and successfully applied in many biomedical and biological applications [2, 8] These measures allow the integration of the biological knowledge contained in the GO structure [9], and have contributed to the improvement of biological analyses [2]. In order to quantify the information content (IC) value of a given term, several approaches have been proposed, each depending on how the concept ‘specificity’ is conceived in the context of the GO DAG structure These approaches are partitioned into two main families, namely annotation- and topology families, and have been largely used to compare GO terms in the GO DAG and proteins at the functional level using their GO annotations

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.