Abstract

Membrane proteins are unique in that segments thereof concurrently reside in vastly different physicochemical environments: the extracellular space, the lipid bilayer, and the cytoplasm. Accordingly, the effects of missense variants disrupting their sequence depend greatly on the characteristics of the environment of the protein segment affected as well as the function it performs. Because membrane proteins have many crucial roles (transport, signal transduction, cell adhesion, etc.), compromising their functionality often leads to diseases including cancers, diabetes mellitus or cystic fibrosis. Here, we report a suite of sequence-based computational methods "Pred-MutHTP" for discriminating between disease-causing and neutral alterations in their sequence. With a data set of 11,846 disease-causing and 9,533 neutral mutations, we obtained an accuracy of 74% and 78% with 10-fold group-wise cross-validation and test set, respectively. The features used in the models include evolutionary information, physiochemical properties, neighboring residue information, and specialized membrane protein attributes incorporating the number of transmembrane segments, substitution matrices specific to membrane proteins as well as residue distributions occurring in specific topological regions. Across 11 disease classes, the method achieved accuracies in the range of 75-85%. The model designed specifically for the transmembrane segments achieved an accuracy of 85% on the test set with a sensitivity and specificity of 86% and 83%, respectively. This renders our method the current state-of-the-art with regard to predicting the effects of variants in the transmembrane protein segments. Pred-MutHTP allows predicting the effect of any variant occurring in a membrane protein-available at https://www.iitm.ac.in/bioinfo/PredMutHTP/.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call