Abstract

The query-template alignment of proteins is one of the most critical steps of template-based modeling methods used to predict the 3D structure of a query protein. This alignment can be interpreted as a temporal classification or structured prediction task and first order Conditional Random Fields have been proposed for protein alignment and proven to be rather successful. Some other popular structured prediction problems, such as speech or image classification, have gained from the use of higher order Conditional Random Fields due to the well known higher order correlations that exist between their labels and features. In this paper, we propose and describe the use of higher order Conditional Random Fields for query-template protein alignment. The experiments carried out on different public datasets validate our proposal, especially on distantly-related protein pairs which are the most difficult to align.

Highlights

  • Proteins carry out most of the work in living cells and their functions are largely determined by their three dimensional (3D) structure which in turn is determined by the amino acid sequence [1]

  • We have described how to carry out query-template alignment of proteins using a Higher Order Conditional Random Field (HO-Conditional Random Fields (CRFs))

  • We have based our proposal on previous developments regarding the use of first order CRFs for protein alignment [16, 18, 20] and the formulation of HO-CRFs [23, 24, 30]

Read more

Summary

Introduction

Proteins carry out most of the work in living cells and their functions (structure, enzyme, messenger, . . .) are largely determined by their three dimensional (3D) structure which in turn is determined by the amino acid sequence [1]. Proteins carry out most of the work in living cells and their functions . .), the rate at which new protein sequences become available is much faster than the rate at which their structure and function are known [3]. Machine Learning techniques are bringing about new methods that fast and accurately predict the function [4, 5] and structure [6, 7] of proteins. Despite the great progress currently made on FM methods (mainly due to the incorporation of co-evolutionary information [2, 7, 9, 10]), FM remains computationally expensive ( for long-length proteins) and most of the servers for protein structure prediction

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call