Abstract

Extracting structural information from sequence co-variation has become a common computational biology practice in the recent years, mainly due to the availability of large sequence alignments of protein families. However, identifying features that are specific to sub-classes and not shared by all members of the family using sequence-based approaches has remained an elusive problem. We here present a coevolutionary-based method to differentially analyze subfamily specific structural features by a continuous sequence reweighting (SR) approach. We introduce the underlying principles and test its predictive capabilities on the Response Regulator family, whose subfamilies have been previously shown to display distinct, specific homo-dimerization patterns. Our results show that this reweighting scheme is effective in assigning structural features known a priori to subfamilies, even when sequence data is relatively scarce. Furthermore, sequence reweighting allows assessing if individual structural contacts pertain to specific subfamilies and it thus paves the way for the identification specificity-determining contacts from sequence variation data.

Highlights

  • The last decade has seen the emergence and maturation of coevolutionary methods aimed at predicting functionally interacting residue pairs from sequence alignments of homologous protein sequences [1,2,3,4,5]

  • The success of covariation-based contact prediction relies on the availability of deep multiple sequence alignments (MSAs) of homologous proteins

  • To investigate how subfamily specific structural features can be extracted from complex protein families, we focus on the abundant and well-characterized family of bacterial response regulators (RR)

Read more

Summary

Introduction

The last decade has seen the emergence and maturation of coevolutionary methods aimed at predicting functionally interacting residue pairs from sequence alignments of homologous protein sequences [1,2,3,4,5]. Organisms with different genetic backgrounds and evolving in different environments will be subject to different fitness landscapes, not necessarily requiring the exact same structural and functional features, while still maintaining the same overall fold and function [20,21] These observations imply that not all the members of a large protein family will necessarily satisfy all contacts predicted by coevolutionary analysis. The limited statistical weight of smaller subfamilies within a global alignment may prevent the identification of their specific features in a standard analysis This last point is of particular importance in the inspection or modelling of precise features pertaining to particular members of protein families rather than features common to the whole family. We build upon the reweighting concept introduced in [10], showing in a complex multi-dimensional case how this reweighting strategy can be used to identify sub-family specific contacts even in the case where the number of sequences is very low

Results
Prediction
Sequence Data Collection and Pre-Processing
Structural Data Collection and Processing
Direct-Coupling Analysis and Sequence Reweighting
Kernel Function Scoring
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call