Abstract

Protein function prediction is a challenging and essential research problem in the field of computational biology. Conventionally, a protein consists of a number of structural domains and performs multiple function. By representing proteins, domains and functions by bags as well as instances and classes respectively, we are able to model the protein function prediction task as the Multi-Instance Multi-Label (MIML) learning problem. Existing MIML algorithms mainly focus on batch setting where training examples are available before learning. Such offline paradigm works well in simulation, but it may be not feasible for real-world online applications where data comes one by one or chunk by chunk. In this paper, we investigate the protein function prediction problem under a new learning framework, called Online Multi-Instance Multi-Label (OMIML) learning, where MIML protein examples arrive sequentially in an online setting, and develop two OMIML algorithms (OMIML-I and OMIML-B) to make predictions for the incoming data. In the proposed OMIML algorithms, variable-length features are constructed to represent the MIML protein examples based on an incremental vocabulary mechanism. In particular, the incremental vocabularies that OMIML-I and OMIML-B are based on consist of instances and bags, respectively. Then we seek an online prediction for each new arrived protein example by incorporating the constructed features into an online multi-label learning algorithm which is constructed by introducing an artificial label into an online multi-label ranking model. We evaluate the algorithms on the protein dataset consisting of seven real-world organisms. Experimental results have demonstrated the effectiveness of the proposed OMIML algorithms for protein function prediction.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call