Abstract

Bag-of-Words with TF-IDF or other weighting schemes is commonly adopted ways for document representation. However, they fail to capture sequential or semantic information in the sentence, and would lead to high-dimensional vector due to misspelling, acronyms and so on. Distributed word embedding and even document embedding methods are proposed to encode the semantic or contextual information. Whereas, the quality of the representation is not always good. To relieve the above mentioned problems, we propose a high-quality document representation model, which takes word morphology, semantic and sequential information of global context into consideration. The proposed model could outperform state-of-the-art traditional ways, word embedding-based and character-aware models on text classification task.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call