Abstract

Text categorisation is an upcoming area in the field of text mining. The text documents possess huge number of features due to their unstructured nature. In this paper, an algorithm for multi label categorisation of text documents based on the concepts of lexical and semantics using word net (MC-LSW) is proposed. The proposed algorithm is based on the concepts of lexical (tokens) and semantics of a language. It aims at minimising the number of tokens used for categorising text documents. MC-LSW uses word net to extract the semantic information of tokens. The proposed algorithm is implemented and tested on five datasets of text domain and is compared with the existing multi label categorisation algorithms. The proposed algorithm (MC-LSW) shows more efficient and promising results in terms of space and time complexity than the existing methods. Accuracy and precision measures have been improved by the proposed algorithm as well as hamming loss has been reduced.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.