Text classification based on a novel ensemble multi-label learning method

Tao Zhang,Jiansheng Wu,Haifeng Hu

doi:10.1109/icsai.2014.7009425

Abstract

Text classification is one of the most significant contents in Natural Language Processing research field. In most real cases, text classification is usually a multi-label learning task. Currently, there are three mainstream attribute measures (i.e., information gain, document frequency and chi-square test values) which are often used to describe documents. The three attribute measures have been applied successfully in some tasks for text classification, but the information that each attribute measure is to focus on is different. It's valuable to improve the prediction performance of text classification by designing ensemble methods to combine these measures. In this paper, we have proposed a novel ensemble multi-label learning method En-MLKNN based on the state-of-the-art multi-label learning method MLKNN for this task. In addition, in order to make better use of our approach, we have constructed a complete framework for text classification. Experiments on two classic datasets show that our En-MLKNN algorithm is superior to most state-of-the-art Multi-Label learning algorithms.

Full Text