Abstract

Secondary structure elements in protein molecules refer to local sub-conformational regions stabilized by hydrogen bonding. Secondary structure elements can be divided into helical, sheet, or loop. Secondary structure elements bolster the folding and topology of the protein. They are important for modern structural bioinformatics such as protein modeling and functional analysis. Therefore, assigning the types of secondary structures in proteins is crucial. Many methods have been developed to address the problem. Methods can be categorized into two approaches. One approach uses the information about hydrogen bonding and energy while the other approach uses protein trace geometry. If the information of some atoms is missing, the second approach is more feasible. In this paper, we develop a machine learning method that belongs to the second approach to assign secondary structure elements. We develop a 3-state machine learning classifier. The classifier uses protein’s Ca information only. The classifier ensembles four (4) machine learning models: Random Forest, Support Vector Machine, Multilayer Perceptron, and eXtreme Gradient Boosting. The classifier is trained with 600K amino acids. We tested our classifier at two different data sets. One data set contains 150K amino acids. The accuracy of our system was 94.6%. In addition, the classifier was tested on a set of 20 protein structures and compared with PCASSO from the same category. The information from Protein Data Bank was used as a reference. The comparison shows that our method can produce assignments that are more aligned with PDB at 93% accuracy while PCASSO achieved S4% accuracy.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.