Abstract

We used various screening techniques, clustering, decision tree and generalized rule induction (association) (GRI) models and molecular phylogenic relationship to search for patterns of halophi-licy and to find features contribute to halolysin salt stability. We found Met was the sole N-terminal amino acid in halolysin proteins, whereas other amino acids found at this position of oth-er proteases and termitase. Eighty-three protein features were shown to be important in feature selection modeling, and just one peer group with an anomaly index of 2.42 declined to 1.87 after being run using only important selected features. The depth of the trees generated by various de-cision tree models varied from 1 to 5 branches. The number of peer groups in clustering models was reduced significantly (p<0.05) compared to datasets without feature selection. In most deci-sion tree models, the frequency of Gly - Gly was the most important feature for decision tree rule sets; and this feature was used in antecedent to support the rules in most GRI association rules. Significant differences (p < 0.001) found in charged amino acids between halolysin and other pro-teins with more Asp and Glu in halolysin proteins, while more hydrophobic residues and aliphatic amino acids found in other proteases.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call