Abstract
The query-template alignment of proteins is one of the most critical steps of template-based modeling methods used to predict the 3D structure of a query protein. This alignment can be interpreted as a temporal classification or structured prediction task and first order Conditional Random Fields have been proposed for protein alignment and proven to be rather successful. Some other popular structured prediction problems, such as speech or image classification, have gained from the use of higher order Conditional Random Fields due to the well known higher order correlations that exist between their labels and features. In this paper, we propose and describe the use of higher order Conditional Random Fields for query-template protein alignment. The experiments carried out on different public datasets validate our proposal, especially on distantly-related protein pairs which are the most difficult to align.
Highlights
Proteins carry out most of the work in living cells and their functions are largely determined by their three dimensional (3D) structure which in turn is determined by the amino acid sequence [1]
We have described how to carry out query-template alignment of proteins using a Higher Order Conditional Random Field (HO-Conditional Random Fields (CRFs))
We have based our proposal on previous developments regarding the use of first order CRFs for protein alignment [16, 18, 20] and the formulation of HO-CRFs [23, 24, 30]
Summary
Proteins carry out most of the work in living cells and their functions (structure, enzyme, messenger, . . .) are largely determined by their three dimensional (3D) structure which in turn is determined by the amino acid sequence [1]. Proteins carry out most of the work in living cells and their functions . .), the rate at which new protein sequences become available is much faster than the rate at which their structure and function are known [3]. Machine Learning techniques are bringing about new methods that fast and accurately predict the function [4, 5] and structure [6, 7] of proteins. Despite the great progress currently made on FM methods (mainly due to the incorporation of co-evolutionary information [2, 7, 9, 10]), FM remains computationally expensive ( for long-length proteins) and most of the servers for protein structure prediction
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.