Abstract

Multi-label document classification is an important challenge with many real-world applications. While multi-label ranking is a common approach for multi-label classification. However existing works usually suffer from incomplete and context-free representation, and nonautomatic and part based model implementation. To solve the problem, we propose a LSTM\(^2\) (Long short term memory) model for document classification in this paper. This model consists of two-steps. The first is repLSTM process which is based on supervised LSTM by introducing the document labels to learn document representation. The second is rankLSTM process. The order of documents labels are rearranged in accordance with a semantics tree, which better exerts the advantages of the LSTM in sequence. Besides by predicting label serially, the model can be trained as a whole. In addition, Connectionist Temporal Classification is used in this process which is a good solution to deal with the error propagation for variable length output (the number of labels in each document). Experiments on three generalization datasets have achieved good results.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call