The effects of shared information on semantic calculations in the gene ontology

Paul W Bible,Hong-Wei Sun,Maria I Morasso,Rasiah Loganantharaj,Lai Wei

doi:10.1016/j.csbj.2017.01.009

Abstract

The structured vocabulary that describes gene function, the gene ontology (GO), serves as a powerful tool in biological research. One application of GO in computational biology calculates semantic similarity between two concepts to make inferences about the functional similarity of genes. A class of term similarity algorithms explicitly calculates the shared information (SI) between concepts then substitutes this calculation into traditional term similarity measures such as Resnik, Lin, and Jiang-Conrath. Alternative SI approaches, when combined with ontology choice and term similarity type, lead to many gene-to-gene similarity measures. No thorough investigation has been made into the behavior, complexity, and performance of semantic methods derived from distinct SI approaches. We apply bootstrapping to compare the generalized performance of 57 gene-to-gene semantic measures across six benchmarks. Considering the number of measures, we additionally evaluate whether these methods can be leveraged through ensemble machine learning to improve prediction performance. Results showed that the choice of ontology type most strongly influenced performance across all evaluations. Combining measures into an ensemble classifier reduces cross-validation error beyond any individual measure for protein interaction prediction. This improvement resulted from information gained through the combination of ontology types as ensemble methods within each GO type offered no improvement. These results demonstrate that multiple SI measures can be leveraged for machine learning tasks such as automated gene function prediction by incorporating methods from across the ontologies. To facilitate future research in this area, we developed the GO Graph Tool Kit (GGTK), an open source C++ library with Python interface (github.com/paulbible/ggtk).

Highlights

Researchers developed the gene ontology (GO) to provide a structured vocabulary that consistently describes the characteristics of genes and proteins across different organisms [1,2]
Despite requiring over 8 million more calculations, the molecular functions (MF) processes completed faster than the cellular components (CC) processes. These results suggest that the topology of the GO graph plays an important role in determining the execution speed of semantic algorithms and that functions of the raw number of terms in an ontology may not accurately reflect their complexity
The Jaccard-based term-set level measures are known to be more efficient. These findings illustrate that the increased time complexity of the graph-based similarity measure (GraSM) methods could be computationally prohibitive in some situations, and the problem may worsen as GO graph complexity grows

Summary

Introduction

Researchers developed the gene ontology (GO) to provide a structured vocabulary that consistently describes the characteristics of genes and proteins across different organisms [1,2]. Specific GO terms in this vocabulary annotate proteins by specifying the biological processes in which they participate, their enzymatic and molecular functions, and their location within the cell. Terms using a directed acyclic graph (DAG). These relationships serve to clarify terminology, for example by identifying when one term may be a more specialized from of another. Three separate ontologies exist that provide a DAG of terms and relationships used to describe biological processes (BP), molecular functions (MF), and cellular components (CC). The Gene Ontology Consortium makes frequent updates to GO modifying the relationship structure and adding or removing terms to better reflect the current understanding of biological functions

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Computational and structural biotechnology journal	Publication Date: Jan 1, 2017
Citations: 3	License type: cc-by

R Discovery Prime

R Discovery Prime

The effects of shared information on semantic calculations in the gene ontology

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computational and structural biotechnology journal

Lead the way for us

Similar Papers

SGFSC: speeding the gene functional similarity calculation based on hash tables.
Zhen Tian ... Zhixia Teng
BMC bioinformatics | VOL. 17
Zhen Tian, et. al.Zhen Tian ... Zhixia Teng
04 Nov 2016
BMC bioinformatics | VOL. 17

GOGCN: Graph Convolutional Network on Gene Ontology for Functional Similarity Analysis of Genes.
Zhen Tian ... Haichuan Fang
IEEE/ACM transactions on computational biology and bioinformatics | VOL. 20
Zhen Tian, et. al.Zhen Tian ... Haichuan Fang
01 Mar 2023
IEEE/ACM transactions on computational biology and bioinformatics | VOL. 20

GO2Vec: transforming GO terms and proteins to vector representations via graph embeddings
Xiaoshi Zhong ... Rama Kaalia
BMC Genomics | VOL. 20
Xiaoshi Zhong, et. al.Xiaoshi Zhong ... Rama Kaalia
01 Dec 2019
BMC Genomics | VOL. 20

IntelliGO: a new vector-based semantic similarity measure including annotation origin.
Sidahmed Benabderrahmane ... Marie-Dominique Devignes
BMC bioinformatics | VOL. 11
Sidahmed Benabderrahmane, et. al.Sidahmed Benabderrahmane ... Marie-Dominique Devignes
01 Dec 2010
BMC bioinformatics | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The effects of shared information on semantic calculations in the gene ontology

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computational and structural biotechnology journal