Abstract

The feature extraction of protein sequences is a challenging problem. It might need a lot of theoretical and practical knowledge from many fields. The difficulty would increase when investigators extract the features solely from protein sequences. In this paper, we present a method of protein granularity. The concepts of protein granularity, granularity order, granularity bound, granularity limit, and granularity increment are given respectively. The protein granularity can dig out the useful information solely from protein sequences. We provide an approach to construct the feature vectors. The feature vectors include the amino acid composition information, the sequence-order information, the same amino acid ‘neighbor’ information, and the sequence length information. Hence, the feature vectors can better represent protein sequences. Our feature extraction method does obviously consider the protein sequence length effects. An experiment of the protein structure class prediction was carried out. The prediction achieved 96.6% overall accuracy, and the success rate for each subset is all-α 92.3%, all-β 100%, α/β 100%, α+β 93.5%, respectively. The last three success rates for subsets are equal to the best success rates in the published literatures. The overall accuracy of PG-SVM prediction is the second best result only having one protein prediction error difference with the first best result. The theoretical and experimental results demonstrate the application of protein granularity succeeds in the feature extraction of protein sequences.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call