Abstract

DNA-binding proteins play important roles in various cellular processes, and the identification of DNA-binding proteins is important for understanding and interpreting protein function. This manuscript presents algorithms for feature representation based on primary protein sequences and selective ensemble classification. We first propose a multi-source interaction fusion feature representation model that simultaneously considers interactions among physicochemical properties, evolutionary information, and gap distances between residues. We also provide a selective ensemble algorithm based on gap distances that yields differential base classifiers by selecting the feature subspaces. The selective ensemble algorithm improves the generalization ability of the integrated classifiers. We then compare the proposed algorithms with some state-of-the-art methods using multiple datasets. The experimental results show that the proposed algorithms are competitive and effectively identify DNA-binding proteins. The major contributions of the present study are the establishment of a model and algorithm for feature representation that involves interaction efforts and the development of a selective ensemble classification algorithm based on parameter perturbation. The proposed algorithms can also be applied to other biological questions related to amino acid sequences.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call