Abstract
In this paper, we investigate how effective the use of semantic information could be in text categorization tasks. To this end, we consider distinct representations of documents differing in the kind of information incorporated: (a) information about terms only, (b) semantic information, and (c) a combination of both types of information. Moreover, we study how the vocabulary size reduction affects this task. The k Nearest Neighbours method was used to perform the categorization, and the vocabulary size was reduced by means of the Information Gain technique. A number of different document codifications were tested. The experimental results showed that the inclusion of semantic information in syntactically and semantically richer corpora could improve the text categorization task, if vocabularies with a sufficient number of features are considered. In our view, however, it is not yet possible to affirm that the introduction of semantic information ensures an improvement on the text categorization task. Our results suggest that the performance depends heavily on the particular characteristics of the corpus used in each case.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.