Abstract

With the advances in genetic sequencing technology, the automated assignment of protein function has become a key challenge in bioinformatics and computational biology. In nature, many kinds of proteins consist of a variety of structural domains, and each domain almost holds its own function independently or implements a new function in cooperation with neighbors. Thus, a multi-domain protein function prediction problem can be converted into multi-instance multi-label (MIML) learning tasks. In this paper, we propose a novel ensemble MIML algorithm called multi-instance multi-label randomized clustering forest (MIMLRC-Forest) for protein function prediction. In MIMLRC-Forest, we develop a set of hierarchical clustering trees and conduct a label transfer mechanism to identify the relevant function labels in learning process. The clustering tree with a hierarchical structure can handle the multi-label problem by exploiting more discriminable label concepts at higher-level nodes and by transferring less discriminable labels into the lower-level nodes. Then, the label dependency can be computed by aggregating tree labels for protein function prediction. Extensive experiments on five real-world protein data sets show the effectiveness of the proposed algorithm compared with several state-of-the-art baselines, including MIMLSVM, MIMLNN, MIML-kNN, EnMIMLNN, and M3MIML.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.