Abstract

Proteins are diverse with their sequences, structures and functions, it is important to study the relations between the sequences, structures and functions. In this paper, we conduct a study that surveying the relations between the protein sequences and their structures. In this study, we use the natural vector (NV) and the averaged property factor (APF) features to represent protein sequences into feature vectors, and use the multi-class MSE and the convex hull methods to separate proteins of different structural classes into different regions. We found that proteins from different structural classes are separable by hyper-planes and convex hulls in the natural vector feature space, where the feature vectors of different structural classes are separated into disjoint regions or convex hulls in the high dimensional feature spaces. The natural vector outperforms the averaged property factor method in identifying the structures, and the convex hull method outperforms the multi-class MSE in separating the feature points. These outcomes convince the strong connections between the protein sequences and their structures, and may imply that the amino acids composition and their sequence arrangements represented by the natural vectors have greater influences to the structures than the averaged physical property factors of the amino acids.

Highlights

  • Protein is an important organics in life

  • We found that the natural vectors of different structural classes are separable by minimum squared error (MSE) hyperplanes and convex hulls, which indicates that the natural vectors of different structural classes occupy different regions in the high-dimensional feature space

  • All the feature extraction analysis is compared with the PseAAC [21,22,23] and PSSM [24,25] analysis, and the classification analysis are compared with the support vector machines (SVMs) [26] and the random forest [27,28] analysis

Read more

Summary

Introduction

Protein is an important organics in life. It is varied with its sequence, structure, and function [1,2,3,4,5,6,7]. It is believed that protein functions are influenced by their structures, and the structures of proteins are influenced by their sequences [1,2,3,4,5,6,7]. Protein structural classification/prediction is a hot topic in bioinformatics research that addresses the relations between protein sequences and their structures [8,9,10,11,12,13,14,15,16].

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.