Computational discovery of direct associations between GO terms and protein domains

Seyed Ziaeddin Alborzi,Marie-Dominique Devignes,David W Ritchie

doi:10.1186/s12859-018-2380-2

Abstract

BackgroundFamilies of related proteins and their different functions may be described systematically using common classifications and ontologies such as Pfam and GO (Gene Ontology), for example. However, many proteins consist of multiple domains, and each domain, or some combination of domains, can be responsible for a particular molecular function. Therefore, identifying which domains should be associated with a specific function is a non-trivial task.ResultsWe describe a general approach for the computational discovery of associations between different sets of annotations by formalising the problem as a bipartite graph enrichment problem in the setting of a tripartite graph. We call this approach “CODAC” (for COmputational Discovery of Direct Associations using Common Neighbours). As one application of this approach, we describe “GODomainMiner” for associating GO terms with protein domains. We used GODomainMiner to predict GO-domain associations between each of the 3 GO ontology namespaces (MF, BP, and CC) and the Pfam, CATH, and SCOP domain classifications. Overall, GODomainMiner yields average enrichments of 15-, 41- and 25-fold GO-domain associations compared to the existing GO annotations in these 3 domain classifications, respectively.ConclusionsThese associations could potentially be used to annotate many of the protein chains in the Protein Databank and protein sequences in UniProt whose domain composition is known but which currently lack GO annotation.

Highlights

Families of related proteins and their different functions may be described systematically using common classifications and ontologies such as Pfam and GO (Gene Ontology), for example
Our results show that the GO-domain associations discovered by this approach represent an average of 15, 41- and 25-fold increase in the number of edges on the concerned bipartite graphs
All CATH and SCOP domain families were transformed into their corresponding superfamilies, and all Pfam “repeat” and “motif ” domain types were discarded

Summary

Introduction

Families of related proteins and their different functions may be described systematically using common classifications and ontologies such as Pfam and GO (Gene Ontology), for example. Protein functions are often performed by highly conserved structural regions identified from sequence or structure alignments, which may be classified into families of domains. One interesting exception is the dcGO database [6] which provides multiple ontological annotations (such as GO) for protein domains. We found that there are several manually curated GO-Pfam associations from InterPro [7] which are not present in dcGO. From the results of a previous version of our approach [8, 9], we estimated that dcGO associations can only annotate 43% of the unannotated structures in the Protein Databank (PDB) [10]

Objectives

Methods

Results

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Nov 1, 2018
Citations: 3	License type: open-access

R Discovery Prime

R Discovery Prime

Computational discovery of direct associations between GO terms and protein domains

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Global analysis of gene function in yeast by quantitative phenotypic profiling
James A Brown ... Nicola M Burrows
Molecular Systems Biology | VOL. 2
James A Brown, et. al.James A Brown ... Nicola M Burrows
01 Jan 2006
Molecular Systems Biology | VOL. 2

BC4GO: a full-text corpus for the BioCreative IV GO task.
K Van Auken ... H.-M Muller
Database | VOL. 2014
K Van Auken, et. al.K Van Auken ... H.-M Muller
28 Jul 2014
Database | VOL. 2014

Gene Ontology annotation of the rice blast fungus, Magnaporthe oryzae
Shaowu Meng ... Thomas K Mitchell
BMC Microbiology | VOL. 9
Shaowu Meng, et. al.Shaowu Meng ... Thomas K Mitchell
01 Feb 2009
BMC Microbiology | VOL. 9

Associating Gene Ontology Terms with Pfam Protein Domains
Seyed Ziaeddin Alborzi ... David W Ritchie
-
Seyed Ziaeddin Alborzi, et. al.Seyed Ziaeddin Alborzi ... David W Ritchie
01 Jan 2017
01 Jan 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Computational discovery of direct associations between GO terms and protein domains

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics