Abstract

The lack of labeled data is a serious problem which greatly hinders the application of text classification in new domains. In this era of information explosion, dependence of labeled data in traditional classification methods becomes ineffective in emerged new domains. The ideology of transfer learning makes it possible to use labeled identical distribution data of old domains for data mining in new domains. However, previous algorithms and practical application systems did not reach the perfect state. This paper presents a novel complete method for text categorization (TC) in new domains where the labeled data are insufficient. We first present an improved weighting strategy of boosting algorithms family to ensure training data can be used more efficiently. We then introduce boosting ideology with the novel weighting strategy into transfer learning, and a novel text classification algorithm is proposed which has the ability to use labeled data of old domains for new domain classification with a high performance. After the mathematical discussion of the proposed algorithm, we finally deploy a real-world system based on it to evaluate the novel method. Experimental results demonstrate that our method is able to achieve both ideal accuracy and efficiency in TC when dealing with cross-domain problems.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.