A discriminative method for family-based protein remote homology detection that combines inductive logic programming and propositional models.

Juliana S Bernardes,Alessandra Carbone,Gerson Zaverucha

doi:10.1186/1471-2105-12-83

Abstract

BackgroundRemote homology detection is a hard computational problem. Most approaches have trained computational models by using either full protein sequences or multiple sequence alignments (MSA), including all positions. However, when we deal with proteins in the "twilight zone" we can observe that only some segments of sequences (motifs) are conserved. We introduce a novel logical representation that allows us to represent physico-chemical properties of sequences, conserved amino acid positions and conserved physico-chemical positions in the MSA. From this, Inductive Logic Programming (ILP) finds the most frequent patterns (motifs) and uses them to train propositional models, such as decision trees and support vector machines (SVM).ResultsWe use the SCOP database to perform our experiments by evaluating protein recognition within the same superfamily. Our results show that our methodology when using SVM performs significantly better than some of the state of the art methods, and comparable to other. However, our method provides a comprehensible set of logical rules that can help to understand what determines a protein function.ConclusionsThe strategy of selecting only the most frequent patterns is effective for the remote homology detection. This is possible through a suitable first-order logical representation of homologous properties, and through a set of frequent patterns, found by an ILP system, that summarizes essential features of protein functions.

Highlights

Remote homology detection is a hard computational problem
We have demonstrated that good performance can be achieved when we used first-order logical representations for the protein sequences based on conserved amino acid positions and based on conserved physicochemical positions in the multiple sequence alignments (MSA)
We called Seq those models that are trained from sequential properties only, and we named Alncons those models that are trained from conserved amino acid positions in a MSA

Summary

Introduction

Remote homology detection is a hard computational problem. Most approaches have trained computational models by using either full protein sequences or multiple sequence alignments (MSA), including all positions. An important problem in Computational Biology is the detection of remote homologous proteins, that is, proteins that have a common ancestor but that have diverged significantly in their primary sequence in evolutionary history. Remote homology detection is the problem of detecting homology in cases of low sequence identity, frequently below 30%. This is an important and hard problem, the development of methods to identify homologs between proteins is essential for functional and comparative genomics. Homology detection methods are today very important to help for sequence annotation and to guide laboratory experiments

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC bioinformatics	Publication Date: Mar 23, 2011
Citations: 58	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

A discriminative method for family-based protein remote homology detection that combines inductive logic programming and propositional models.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics

Lead the way for us

Similar Papers

Remote homology detection incorporating the context of physicochemical properties
Oscar Bedoya ... Irene Tischer
Computers in Biology and Medicine | VOL. 45
Oscar Bedoya, et. al.Oscar Bedoya ... Irene Tischer
27 Nov 2013
Computers in Biology and Medicine | VOL. 45

Reducing dimensionality in remote homology detection using predicted contact maps
Oscar Bedoya ... Irene Tischer
Computers in Biology and Medicine | VOL. 59
Oscar Bedoya, et. al.Oscar Bedoya ... Irene Tischer
31 Jan 2015
Computers in Biology and Medicine | VOL. 59

Learning relational rule from examples that are neither positive nor negative
Ryutaro Ichise ... Masayuki Numao
Systems and Computers in Japan | VOL. 32
Ryutaro Ichise, et. al.Ryutaro Ichise ... Masayuki Numao
26 Nov 2001
Systems and Computers in Japan | VOL. 32

Remote protein homology detection and fold recognition using two-layer support vector machine classifiers
Hilmi M Muda ... Razib M Othman
Computers in Biology and Medicine | VOL. 41
Hilmi M Muda, et. al.Hilmi M Muda ... Razib M Othman
25 Jun 2011
Computers in Biology and Medicine | VOL. 41

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A discriminative method for family-based protein remote homology detection that combines inductive logic programming and propositional models.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics