Transfer learning with reasonable boosting strategy

Lei La,Qiao Guo,Qimin Cao,Yongliang Wang

doi:10.1007/s00521-012-1297-3

Abstract

The lack of labeled data is a serious problem which greatly hinders the application of text classification in new domains. In this era of information explosion, dependence of labeled data in traditional classification methods becomes ineffective in emerged new domains. The ideology of transfer learning makes it possible to use labeled identical distribution data of old domains for data mining in new domains. However, previous algorithms and practical application systems did not reach the perfect state. This paper presents a novel complete method for text categorization (TC) in new domains where the labeled data are insufficient. We first present an improved weighting strategy of boosting algorithms family to ensure training data can be used more efficiently. We then introduce boosting ideology with the novel weighting strategy into transfer learning, and a novel text classification algorithm is proposed which has the ability to use labeled data of old domains for new domain classification with a high performance. After the mathematical discussion of the proposed algorithm, we finally deploy a real-world system based on it to evaluate the novel method. Experimental results demonstrate that our method is able to achieve both ideal accuracy and efficiency in TC when dealing with cross-domain problems.

Full Text