Abstract

Prediction of long-range interresidue contacts is an important topic in bioinformatics research. It is helpful for determining protein structures, understanding protein foldings, and therefore advancing the annotation of protein functions. In this chapter, we propose a novel ensemble of genetic algorithm classifiers to address the long-range contact prediction problem. Our method is based on the key idea called sequence profile centers Each sequence profile center is the average sequence profiles of residue pairs belonging to the same contact class or noncontact class. GaCs train on multiple but different pairs of long-range contact data (positive data) and long-range noncontact data (negative data). The negative datasets, having roughly the same sizes as the positive ones, are constructed by random sampling over the original imbalanced negative data. As a result, about 21.5% long-range contacts are correctly predicted. We also found that the ensemble of GaCs indeed makes an accuracy improvement by around 5.6% over the single GaC. Classifiers with the use of sequence profile centers may advance the long-range contact prediction. In line with this approach, key structural features in proteins would be determined with high efficiency and accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call