Abstract

In natural language processing, text classification is a fundamental problem. Multi-label classification of textual data is a challenging topic in text classification where an instance can be associated with more than one label. This paper presents a multi-label annotation and classification methodology for Arabic text data that is not currently classified as multi-label, aiming to analyze and compare the performance of various multi-label learning approaches. The current work includes two phases: The first involves automatic annotation of hotel reviews with more than one label based on the aspects found in the reviews. In this phase, review data instances were automatically annotated as multi-label based on the extracted seed keyphrases clusters. The second phase involves experiments to compare the performance of various multi-label classification learning methods. In this phase, we introduced different models including a feed-forward networks model that learns a vector representation based on the bi-gram alphabet rather than the commonly used bag-of-words model. The bi-gram alphabet vector representation model has the advantage of having reduced feature dimensions and not requiring natural language processing tools. The results indicated that employing the bi-gram alphabet vector representation feed forward neural network is a competitive solution for the multi-label text classification problem. It has achieved an accuracy of about 75.2%, and standard deviation (0.062).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call