Abstract

Text Classification classifies a document, under a predefined category. Mostly, an automatic text classification is an important application taken as a research topic, since the inception of digital documents. In this study, Hypernyms, superordinate words are identified in web and clubbed with entailment rule acquisition. Available tree of hyponym words in the document has been created and used with dependency tree. Features extraction is performed with weighted Term Frequency-Inverse Document Frequency (TF-IDF) where the weight of the word can be computed based on the number of hyponyms present in the radix tree. Performance evaluation is done using Support Vector Machine (SVM) classifier and Fuzzy Unordered Rule Induction Algorithm (FURIA) classifier.

Highlights

  • Webpage classification techniques use text in the page, the link structure, hyperlink structure or anchor text information to classify a target page

  • The results show that stemming-based text representation achieved better performance than hypernym-based text representation

  • Results show that the abilities of HNB performed better than other methods and Symmetrical Uncertainty (SU) was more competitive than ReliefF in web pages categorization

Read more

Summary

INTRODUCTION

Webpage classification techniques use text in the page, the link structure, hyperlink structure or anchor text information to classify a target page. The addition of this enormous amount of data along with interactive and content rich nature of the web made it very popular These pages vary to a great extent in both the content and quality of information. Conceptual structures are defined on ontology, appropriate to the idea of machine process-able data on the semantic web. Ontologies are data schemas; provide a controlled vocabulary of concepts in which each includes an explicitly defined machine process-able semantics. Ontology learning methods require diverse techniques from different fields like knowledge acquisition, database management, naturallanguage processing, information retrieval, artificial intelligence and machine learning. This study identifies the web page Hypernyms (superordinate words) and clubbed it with entailment rule acquisition to classify the web documents

LITERATURE REVIEW
METHODOLOGY
Findings
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.