Abstract
Background: The flexibility of protein structures is often related to the function of the protein. Feature selection (FS) is very critical to the application of a lot of machine learning which deals with small sampling and high-dimensional data. For the prediction of the flexible regions by the protein sequences, it is important to build a machine learning methodology which is based on an effective feature selection technology. This may also provide new knowledge to understand the protein folding process. Method: Firstly, the frequencies of the k-spaced amino acid pairs are taken as a representation of the local sequences. Secondly, these representations are processed by feature selection based on incremental of diversity (FSID) to reduce the dimensionality. Finally, the logistic regression approach is applied to integrate the selected features into a scheme to discriminate flexible or rigid (referred to as FSID_FRP). Results: 74 features are selected from the set of 66 sequences, which includes 26 flexible patterns and 48 rigid patterns. Most of the flexible patterns are associated with Glycine or Proline, and the rigid patterns are associated with Leucine or Valine. We obtained 79.41% accuracy and 0.51 MCC using the FSID_FRP method in which we applied logistic regression and used the representation of the 74 features. The results of FSID_FRP method are comparable to that of FlexRP method that includes 95 features. Conclusion: A simple feature selection method FSID is shown to be very efficient in the prediction of the flexible/rigid regions of protein sequences. This method is more appropriate for small-sampling classification than the entropy-based feature selection method. The proposed FSID_FRP method achieved 80% prediction accuracy and stronger generalization ability. Keywords: Feature selection, increment of diversity, k-spaced amino acid pairs, logistic regression, protein flexible regions, protein sequences.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.