Analysis of Transfer Learning Performance Measures

Taghi M Khoshgoftaar,Karl R Weiss

doi:10.1109/iri.2017.43

Abstract

In machine learning applications, there are scenarios of having no labeled training data, due to the data being rare or too expensive to obtain. In these cases, it is desirable to use readily available labeled data, that is similar to, but not the same as, the domain application of interest. Transfer learning algorithms are used to build high-performance classifiers, when the training data has different distribution characteristics from the testing data. For a transfer learning environment, it is not possible to use validation techniques (such as cross validation or data splitting) to set the desired performance of a classifier, due to the lack of labeled training data from the test domain. As a result, the area under the receiver operating characteristic curve (AUC) performance measure may not be predictive of the actual classifier performance. In an environment where validation techniques are not possible, the relationship between AUC and classification accuracy is needed to better characterize transfer learning algorithm performance. This paper provides relative performance analysis of state-of-the-art transfer learning algorithms and traditional machine learning algorithms, addressing the correlation between AUC and classification accuracy under domain class imbalance conditions with statistical analysis provided.

Full Text