Abstract

Multi-Instance (MI) learning has been proven to be effective for the genome-wide protein function prediction problems where each training example is associated with multiple instances. Many studies in this literature attempted to find an appropriate Multi-Instance Learning (MIL) method for genome-wide protein function prediction under a usual assumption, the underlying distribution from testing data (target domain, i.e., TD) is the same as that from training data (source domain, i.e., SD). However, this assumption may be violated in real practice. To tackle this problem, in this paper, we propose a Multi-Instance Metric Transfer Learning (MIMTL) approach for genome-wide protein function prediction. In MIMTL, we first transfer the source domain distribution to the target domain distribution by utilizing the bag weights. Then, we construct a distance metric learning method with the reweighted bags. At last, we develop an alternative optimization scheme for MIMTL. Comprehensive experimental evidence on seven real-world organisms verifies the effectiveness and efficiency of the proposed MIMTL approach over several state-of-the-art methods.

Highlights

  • During the past decades, a variety of computational methods have been proposed to tackle the genome-wide protein function prediction problem[1,2,3]

  • A number of transfer learning methods[16,17] have been developed. Most of these transfer learning algorithms are designed for single-instance learning (SIL) where training example is represented by one instance

  • We compare the performance of MultiInstance Metric Transfer Learning (MIMTL) (The source code of MIMTL will be open upon the publication of papers.) with several sate-of-the-arts multi-instance learning methods including MIMLSVM6, MIMLNN6, EnMIMLNN4 and MICS31

Read more

Summary

Related Works

Previous studies related to our work can be classified into three categories: traditional MIL, metric learning based MIL and transfer learning based MIL. Different from MIMLkNN, Multi-instance Multi-label Support Vector Machine (MIMLSVM)[6] first degenerates MIL task to a simplified single-instance learning (SIL) task by utilizing a clustering-based representation transformation[6,27]. After this transformation, each training bag is transformed into a single instance. MIML-DML5 attempts to find a distance metric by considering that the same category bag pairs should have a smaller distance than that from different categories These metric-based MIL approaches are both designed for the traditional MIL problem where the bags in SD and TD are drawn from the same distribution. MICS does not present the method to utilize the learned weights into multi-instance metric learning

Method
2: Centralize the input data:
Results
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.