Abstract

Understanding protein mutations is critical for comprehending evolution, drug development, personalized medicine, genomics and much more throughout biology. Classification here is performed by using features, which can be generated/extracted from data, or chosen by an investigator. One kind of feature set showing initial promise not only for classification but also finding out what the basis is for each classification utilizes the physical properties of amino acids. It has been shown that there are periodicities and patterns in protein sequences when they are encoded as these physical property values. An encoding of a sequence where each residue becomes a vector of the amino acid's physical property values provides much more information for sequence similarity measures and can serve as feature components for classifiers. Once a classifier is constructed and trained, it can then be analyzed for feature importance. One of the best methods for this is the permutation importance algorithm. For each of the features, the values are randomly permuted, and then the prediction error is recalculated. The change in the error determines the feature's importance for any specific model, and this technique can be used on any trained classification or regression model. Classification of protein mutants based on physical properties combined with permutation importance can reveal information about why or how each mutation leads to changes in structure, dynamics or function. Extensions include investigating the correlations between properties at residues seen to be correlated in the sequence alignments. This method can be used on any scale, with any set of features, making it a potentially powerful tool for protein analysis.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.