Abstract

The prediction of secondary protein structures is one of the classic problems of bioinformatics and has several practical applications.In this work, we present an ensemble of bidirectional recurrent networks with random forests. The fusion was performed using weights for each of the classifiers and classes, found by searching through genetic algorithms. To evaluate our method, we used the PDB, CB6133 and CB513 datasets. We achieved 73.1%, 59.1% and 55.8% Q8 accuracy on PDB, CB6133 and CB513 proteins, respectively, whereas 81.5% Q3 accuracy on PDB proteins using only amino acid sequence information. In order to compare our results against the literature, we also evaluated our method using amino acid sequence and sequence profile features, achieving 73.4% and 68.9% Q8 accuracy on CB6133 and CB513. Our method yielded good results when using only the amino acid sequence and presents competitive results compared tothe literature when using amino acid sequence information and protein sequence similarity.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call