A survey on heterogeneous transfer learning

Oscar Day,Taghi M Khoshgoftaar

doi:10.1186/s40537-017-0089-0

Oscar Day, Taghi M Khoshgoftaar

Open Access

https://doi.org/10.1186/s40537-017-0089-0

Copy DOI

Abstract

Transfer learning has been demonstrated to be effective for many real-world applications as it exploits knowledge present in labeled training data from a source domain to enhance a model’s performance in a target domain, which has little or no labeled target training data. Utilizing a labeled source, or auxiliary, domain for aiding a target task can greatly reduce the cost and effort of collecting sufficient training labels to create an effective model in the new target distribution. Currently, most transfer learning methods assume the source and target domains consist of the same feature spaces which greatly limits their applications. This is because it may be difficult to collect auxiliary labeled source domain data that shares the same feature space as the target domain. Recently, heterogeneous transfer learning methods have been developed to address such limitations. This, in effect, expands the application of transfer learning to many other real-world tasks such as cross-language text categorization, text-to-image classification, and many others. Heterogeneous transfer learning is characterized by the source and target domains having differing feature spaces, but may also be combined with other issues such as differing data distributions and label spaces. These can present significant challenges, as one must develop a method to bridge the feature spaces, data distributions, and other gaps which may be present in these cross-domain learning tasks. This paper contributes a comprehensive survey and analysis of current methods designed for performing heterogeneous transfer learning tasks to provide an updated, centralized outlook into current methodologies.

Highlights

Machine learning is of increasing importance due to its success and benefit in real-world applications
The proposed supervised Heterogeneous Feature Augmentation (SHFA) performed significantly better than the other baselines for these experiments by having better classification accuracy, including better performance over standard Heterogeneous Feature Augmentation (HFA)
When faced with little or no labeled training data, a model trained on such data will have insufficient discriminatory ability and would be unable to predict accurately

Summary

Introduction

Machine learning is of increasing importance due to its success and benefit in real-world applications. Models used in machine learning are trained from a series of examples comprised of features/attributes that are associated with a single label. This label can be a class value for classification tasks or a numerical value for regression tasks [1]. When faced with unsupervised tasks, the class labels are not provided during training which can make the training process more challenging. Once these models are trained we can apply them to predict the value for a newly arriving, unseen instance. If the ground truth label is available, we can compare it to the predicted value as to calculate performance metrics [2] for the model

Methods

Findings

Discussion

Conclusion