GO2Vec: transforming GO terms and proteins to vector representations via graph embeddings

Xiaoshi Zhong,Rama Kaalia,Jagath C Rajapakse

doi:10.1186/s12864-019-6272-2

Abstract

BackgroundSemantic similarity between Gene Ontology (GO) terms is a fundamental measure for many bioinformatics applications, such as determining functional similarity between genes or proteins. Most previous research exploited information content to estimate the semantic similarity between GO terms; recently some research exploited word embeddings to learn vector representations for GO terms from a large-scale corpus. In this paper, we proposed a novel method, named GO2Vec, that exploits graph embeddings to learn vector representations for GO terms from GO graph. GO2Vec combines the information from both GO graph and GO annotations, and its learned vectors can be applied to a variety of bioinformatics applications, such as calculating functional similarity between proteins and predicting protein-protein interactions.ResultsWe conducted two kinds of experiments to evaluate the quality of GO2Vec: (1) functional similarity between proteins on the Collaborative Evaluation of GO-based Semantic Similarity Measures (CESSM) dataset and (2) prediction of protein-protein interactions on the Yeast and Human datasets from the STRING database. Experimental results demonstrate the effectiveness of GO2Vec over the information content-based measures and the word embedding-based measures.ConclusionOur experimental results demonstrate the effectiveness of using graph embeddings to learn vector representations from undirected GO and GOA graphs. Our results also demonstrate that GO annotations provide useful information for computing the similarity between GO terms and between proteins.

Highlights

Semantic similarity between Gene Ontology (GO) terms is a fundamental measure for many bioinformatics applications, such as determining functional similarity between genes or proteins
GO includes three categories of ontologies: Biological Process (BP), Cellular Component (CC), and Molecular Function (MF); each category of the ontologies is organized as a directed acyclic graph (DAG) and is referred to as a GO graph, where a node denotes a GO term while an edge denotes a kind of relationships
We conducted two kinds of experiments to evaluate the quality of the learned vectors of GO2Vec: (1) evaluation of protein similarities on the Collaborative Evaluation of GO-based Semantic Similarity Measures (CESSM) dataset and (2) prediction of protein-protein interactions (PPI) on Yeast and Human networks

Summary

Introduction

Semantic similarity between Gene Ontology (GO) terms is a fundamental measure for many bioinformatics applications, such as determining functional similarity between genes or proteins. Most previous methods of estimating the semantic similarity of GO terms are based on the information content (IC) Such pioneered methods [5,6,7] and their variants [8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24] compute the semantic similarity between two GO terms according to their distances to the closest common ancestor term associated with the structure of GO DAG or associated statistics of their common ancestor terms. These methods have succeeded in the development of computing the GO term similarity over the past two decades

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Genomics	Publication Date: Dec 1, 2019
Citations: 21	License type: open-access

R Discovery Prime

R Discovery Prime

GO2Vec: transforming GO terms and proteins to vector representations via graph embeddings

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics

Lead the way for us

Similar Papers

Comparative Analysis of Unsupervised Protein Similarity Prediction Based on Graph Embedding.
Yuanyuan Zhang ... Junliang Shang
Frontiers in genetics | VOL. 12
Yuanyuan Zhang, et. al.Yuanyuan Zhang ... Junliang Shang
22 Sep 2021
Frontiers in genetics | VOL. 12

Predicting Missing and Spurious Protein-Protein Interactions Using Graph Embeddings on GO Annotation Graph
Xiaoshi Zhong ... Jagath C Rajapakse
-
Xiaoshi Zhong, et. al.Xiaoshi Zhong ... Jagath C Rajapakse
01 Nov 2019
01 Nov 2019

RGFinder: a system for determining semantically related genes using GO graph minimum spanning tree.
Kamal Taha
IEEE Transactions on NanoBioscience | VOL. 14
Kamal TahaKamal Taha
16 Oct 2014
IEEE Transactions on NanoBioscience | VOL. 14

An improved method for scoring protein-protein interactions using semantic similarity within the gene ontology
Shobhit Jain ... Gary D Bader
BMC Bioinformatics | VOL. 11
Shobhit Jain, et. al.Shobhit Jain ... Gary D Bader
15 Nov 2010
BMC Bioinformatics | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

GO2Vec: transforming GO terms and proteins to vector representations via graph embeddings

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics