Abstract

Knowledge about protein structure assignment enriches the structural and functional understanding of proteins. Accurate and reliable structure assignment data is crucial for secondary structure prediction systems. Since the 1980s, various methods based on hydrogen bond analysis and atomic coordinate geometry, followed by machine learning, have been employed in protein structure assignment. However, the assignment process becomes challenging when missing atoms are present in the protein files. Our method proposed a multi-class classifier program named DLFSA for assigning protein secondary structure elements (SSE) using convolutional neural networks (CNNs). A fast and efficient GPU-based parallel procedure extracts fragments from protein files. The model implemented in this work is trained with a subset of the protein fragments and achieves 88.1% and 82.5% train and test accuracy, respectively. The model uses only Cα coordinates for secondary structure assignments. The model has been successfully tested on a few full-length proteins also. Results from the fragment-based studies demonstrate the feasibility of applying deep learning solutions for structure assignment problems.

Highlights

  • Pauling and Corey identify the existence of regular substructures namely, α − helices(H) and β − sheets(E), in protein molecules[1]

  • State-of-the-art systems use machine learning with manually engineered feature extraction, and none of the assignment systems currently available is entirely based on Deep Learning techniques

  • We developed a CNN based model to automate protein structure assignment process

Read more

Summary

Introduction

Pauling and Corey identify the existence of regular substructures namely, α − helices(H) and β − sheets(E), in protein molecules[1] Irregular curves connecting these regular structures are called coils(C)[2,3,4]. This three-state classification extends to a finer eight state classification that includes the states, viz. Protein structure assignment is the process of associating secondary-structure information into experimentally determining coordinates of a protein. Most protein structure modelling systems use this secondary structure information at their initial steps, as it cuts down the conformational search space substantially, and thereby accelerating the whole prediction process[6,7,8]. These secondary structure prediction systems require structure assignment data that serves as ground truth for training the models

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call