The prediction of contact maps in protein is a challenging topic for the determination of three-dimensional protein structures. In this paper, we introduce Forest of Decision Trees, a methodology for the prediction of protein contact maps based on (1) a divide-and-conquer approach to analyze the prediction problem; (2) a codification vector that combines the information obtained from the target amino acids neighborhood, and the sub-sequence between them; (3) an ensemble of classifiers that employs a hybrid of Genetic Algorithms and Decision Trees as base classifiers; and (4) a rulebased interpretation mechanism. The comparison against the top sequence-based methods in CASP10 showed that our predictor is very competitive, showing a high reliability. Their main advantage is its capability to generate a humancomprehensible rule-based interpretation mechanism, giving the specialist some clues to find an easier and interpretable solution for the protein-folding recognition and the prediction of unknown structures. Keywords: CASP10, contact maps prediction, decision trees, genetic algorithms, multiple classifier systems, protein structure prediction.
Read full abstract