Abstract

Prediction of protein structural classes is one of the most important and challenging tasks in the bioinformatics field. A protein is classified into one of the four main types of protein structural classes; all-α, all-β, α/β and α+β. This paper investigates the role of amino acid indices (AAI) combined with traditional amino acid composition (AAC) to create a weighted amino acid composition (WAAC) feature-set to predict the structural class of a protein. There are over 500 amino acid indices that can be used to develop the novel weighted amino acid composition feature-set which has a great potential of increasing accuracy for the prediction of protein structural classes. For evaluation of these indices a high quality 40% homology dataset is used that contains over 7000 protein sequences (the largest of its kind) extracted from proteomic databases. The predictive technique developed is an optimum k-nearest-neighbour classifier, named multiple-k-nearest-neighbour (MKNN). In order to evaluate the classifier a 10- fold cross-validation test procedure is used throughout the study. Over 1 million analyses were carried out, the highest accuracy obtained was from index LEVM780101 at 48.35%, which is 9% higher than traditional AAC and 6.6% higher than that of the best sequence-driven-feature sub-set used in other studies. There is great potential for further improvement as WAAC is a feature-set with the least number of attributes without any feature selection and the numbers of indices that yielded higher accuracies than traditional AAC and other sequence-driven-features are 536 and 435, respectively, out of the 548 amino acid indices analysed in this study.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call