Abstract
BackgroundAccurate prediction of inter-residue contacts of a protein is important to calculating its tertiary structure. Analysis of co-evolutionary events among residues has been proved effective in inferring inter-residue contacts. The Markov random field (MRF) technique, although being widely used for contact prediction, suffers from the following dilemma: the actual likelihood function of MRF is accurate but time-consuming to calculate; in contrast, approximations to the actual likelihood, say pseudo-likelihood, are efficient to calculate but inaccurate. Thus, how to achieve both accuracy and efficiency simultaneously remains a challenge.ResultsIn this study, we present such an approach (called clmDCA) for contact prediction. Unlike plmDCA using pseudo-likelihood, i.e., the product of conditional probability of individual residues, our approach uses composite-likelihood, i.e., the product of conditional probability of all residue pairs. Composite likelihood has been theoretically proved as a better approximation to the actual likelihood function than pseudo-likelihood. Meanwhile, composite likelihood is still efficient to maximize, thus ensuring the efficiency of clmDCA. We present comprehensive experiments on popular benchmark datasets, including PSICOV dataset and CASP-11 dataset, to show that: i) clmDCA alone outperforms the existing MRF-based approaches in prediction accuracy. ii) When equipped with deep learning technique for refinement, the prediction accuracy of clmDCA was further significantly improved, suggesting the suitability of clmDCA for subsequent refinement procedure. We further present a successful application of the predicted contacts to accurately build tertiary structures for proteins in the PSICOV dataset.ConclusionsComposite likelihood maximization algorithm can efficiently estimate the parameters of Markov Random Fields and can improve the prediction accuracy of protein inter-residue contacts.
Highlights
Accurate prediction of inter-residue contacts of a protein is important to calculating its tertiary structure
We present comprehensive experiments on popular benchmark datasets, including PSICOV dataset and CASP-11 dataset
PSICOV dataset contains 150 proteins and each protein has a highly resolved X-ray crystallographic structure available and the length ranges from 50 to 275; CASP11 dataset is from CASP11 experiments and contains 85 proteins[38]
Summary
Accurate prediction of inter-residue contacts of a protein is important to calculating its tertiary structure. The native structures are stabilized by local and global interactions among residues, forming inter-residue contacts with proximity [2]. A great variety of studies have been conducted for predicting inter-residue contacts, which fall into two categories, namely, supervised learning approaches and purely-sequence-based approaches. Supervised learning approaches [7,8,9,10] use training sets composed of residue pairs and contact labels indicating whether these residue pairs form contact or not. Wang et al applied deep learning techniques to denoise predicted inter-residue contacts, and successfully used predicted contacts to build tertiary structures of several membrane proteins [23]
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have