Abstract

Text classification has been widely used to assist users with the discovery of useful information from the Internet. However, Current text classification systems are based on the “Bag ofWords” (BOW) representation, which only accounts for term frequency in the documents, and ignores important semantic relationships between key terms. To overcome this problem, previous work attempted to enrich text representation by means of manual intervention or automatic document expansion. The achieved improvement is unfortunately very limited, due to the poor coverage capability of the dictionary, and to the ineffectiveness of term expansion. Fortunately, DBpedia appeared recently which contains rich semantic information. In this paper, we proposed a method compiling DBpedia knowledge into document representation to improve text classification. It facilitates the integration of the rich knowledge of DBpedia into text documents, by resolving synonyms and introducing more general and associative concepts. To evaluate the performance of the proposed method, we have performed an empirical evaluation using SVM calssifier on several real data sets. The experimental results show that our proposed framework, which integrates hierarchical relations, synonym and associative relations with traditional text similarity measures based on the BOW model, does improve text classification performance significantly.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.