Identification of Divergent Protein Domains by Combining HMM-HMM Comparisons and Co-Occurrence Detection

Amel Ghouila,Isabelle Florent,Olivier Gascuel,Fatma Zahra Guerfali,Dhafer Laouini,Laurent Bréhélin,Nicolas Terrapon,Sadok Ben Yahia

doi:10.1371/journal.pone.0095275

Abstract

Identification of protein domains is a key step for understanding protein function. Hidden Markov Models (HMMs) have proved to be a powerful tool for this task. The Pfam database notably provides a large collection of HMMs which are widely used for the annotation of proteins in sequenced organisms. This is done via sequence/HMM comparisons. However, this approach may lack sensitivity when searching for domains in divergent species. Recently, methods for HMM/HMM comparisons have been proposed and proved to be more sensitive than sequence/HMM approaches in certain cases. However, these approaches are usually not used for protein domain discovery at a genome scale, and the benefit that could be expected from their utilization for this problem has not been investigated. Using proteins of P. falciparum and L. major as examples, we investigate the extent to which HMM/HMM comparisons can identify new domain occurrences not already identified by sequence/HMM approaches. We show that although HMM/HMM comparisons are much more sensitive than sequence/HMM comparisons, they are not sufficiently accurate to be used as a standalone complement of sequence/HMM approaches at the genome scale. Hence, we propose to use domain co-occurrence — the general domain tendency to preferentially appear along with some favorite domains in the proteins — to improve the accuracy of the approach. We show that the combination of HMM/HMM comparisons and co-occurrence domain detection boosts protein annotations. At an estimated False Discovery Rate of 5%, it revealed 901 and 1098 new domains in Plasmodium and Leishmania proteins, respectively. Manual inspection of part of these predictions shows that it contains several domain families that were missing in the two organisms. All new domain occurrences have been integrated in the EuPathDomains database, along with the GO annotations that can be deduced.

Highlights

With the continuous improvement of genome sequencing technologies, an increasing number of new genomes are emerging everyday, enhancing basic knowledge on the diversity of organisms and providing valuable data to understand their biology and evolutionary relationships
The aim of this work is to boost Pfam domain predictions using profile/profile comparison in order to enrich our knowledge on the protein domain catalogue of the two major pathogens L. major and P. falciparum
All Pfam domains that can be identified by HMMER with the recommended score thresholds are considered as known in the following, and our aim is to identify new domain occurrences

Summary

Introduction

With the continuous improvement of genome sequencing technologies, an increasing number of new genomes are emerging everyday, enhancing basic knowledge on the diversity of organisms and providing valuable data to understand their biology and evolutionary relationships. Since functional annotation tools have been developed based on this wealth of unbalanced data, they show limits when applied to the exploration of divergent genomes [1,2]. Two thirds of mono-domain proteins having the same domain have the same function. 35% of multi-domain proteins having one common domain present similar functions, while this rate increases to 80% when they share two common domains [4]. Protein domains provide meaningful information for comparative genomics [5,6] as well as for studying protein-protein interactions [7]

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLoS ONE	Publication Date: Jun 5, 2014
Citations: 6	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Identification of Divergent Protein Domains by Combining HMM-HMM Comparisons and Co-Occurrence Detection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE

Lead the way for us

Similar Papers

EuPathDomains: The divergent domain database for eukaryotic pathogens
Amel Ghouila ... Laurent Bréhélin
Infection, Genetics and Evolution | VOL. 11
Amel Ghouila, et. al.Amel Ghouila ... Laurent Bréhélin
02 Oct 2010
Infection, Genetics and Evolution | VOL. 11

Identification of Protein Domains by Shotgun Proteolysis
Daniel Christ ... Greg Winter
Journal of Molecular Biology | VOL. 358
Daniel Christ, et. al.Daniel Christ ... Greg Winter
13 Feb 2006
Journal of Molecular Biology | VOL. 358

Improvement in Protein Domain Identification Is Reached by Breaking Consensus, with the Agreement of Many Profiles and Domain Co-occurrence.
Juliana Bernardes ... Catherine Vaquero
PLOS Computational Biology | VOL. 12
Juliana Bernardes, et. al.Juliana Bernardes ... Catherine Vaquero
29 Jul 2016
PLOS Computational Biology | VOL. 12

De novo identification of essential protein domains from CRISPR-Cas9 tiling-sgRNA knockout screens
Wei He ... Mark T Bedford
Nature Communications | VOL. 10
Wei He, et. al.Wei He ... Mark T Bedford
04 Oct 2019
Nature Communications | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Identification of Divergent Protein Domains by Combining HMM-HMM Comparisons and Co-Occurrence Detection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE