Inferring latent task structure for Multitask Learning by Multiple Kernel Learning

Christian Widmer,Gunnar Rätsch,Nora C Toussaint,Yasemin Altun

doi:10.1186/1471-2105-11-s8-s5

Christian Widmer, Gunnar Rätsch + Show 2 more

Open Access

https://doi.org/10.1186/1471-2105-11-s8-s5

Copy DOI

Abstract

BackgroundThe lack of sufficient training data is the limiting factor for many Machine Learning applications in Computational Biology. If data is available for several different but related problem domains, Multitask Learning algorithms can be used to learn a model based on all available information. In Bioinformatics, many problems can be cast into the Multitask Learning scenario by incorporating data from several organisms. However, combining information from several tasks requires careful consideration of the degree of similarity between tasks. Our proposed method simultaneously learns or refines the similarity between tasks along with the Multitask Learning classifier. This is done by formulating the Multitask Learning problem as Multiple Kernel Learning, using the recently published q-Norm MKL algorithm.ResultsWe demonstrate the performance of our method on two problems from Computational Biology. First, we show that our method is able to improve performance on a splice site dataset with given hierarchical task structure by refining the task relationships. Second, we consider an MHC-I dataset, for which we assume no knowledge about the degree of task relatedness. Here, we are able to learn the task similarities ab initio along with the Multitask classifiers. In both cases, we outperform baseline methods that we compare against.ConclusionsWe present a novel approach to Multitask Learning that is capable of learning task similarity along with the classifiers. The framework is very general as it allows to incorporate prior knowledge about tasks relationships if available, but is also able to identify task similarities in absence of such prior information. Both variants show promising results in applications from Computational Biology.

Highlights

The lack of sufficient training data is the limiting factor for many Machine Learning applications in Computational Biology
Starting from a special case, where there exists a single meta-task consisting of all tasks, we show that inferring the latent structure can be cast as a Multiple Kernel Learning problem, where the base kernels are defined with respected to Dirac kernels [9] that establish relatedness of all possible task combinations and correspond to all possible meta-tasks
Before we describe our formulation of Multi-task learning (MTL) as MKL approach, we briefly review the formulations of MTL and MKL that lay the foundations for our approach

Summary

Results

We demonstrate the performance of our method on two problems from Computational Biology. We show that our method is able to improve performance on a splice site dataset with given hierarchical task structure by refining the task relationships. We consider an MHC-I dataset, for which we assume no knowledge about the degree of task relatedness. We are able to learn the task similarities ab initio along with the Multitask classifiers. In both cases, we outperform baseline methods that we compare against

Conclusions

Background

Results and discussion

Methods