Abstract
Text Classification classifies a document, under a predefined category. Mostly, an automatic text classification is an important application taken as a research topic, since the inception of digital documents. In this study, Hypernyms, superordinate words are identified in web and clubbed with entailment rule acquisition. Available tree of hyponym words in the document has been created and used with dependency tree. Features extraction is performed with weighted Term Frequency-Inverse Document Frequency (TF-IDF) where the weight of the word can be computed based on the number of hyponyms present in the radix tree. Performance evaluation is done using Support Vector Machine (SVM) classifier and Fuzzy Unordered Rule Induction Algorithm (FURIA) classifier.
Highlights
Webpage classification techniques use text in the page, the link structure, hyperlink structure or anchor text information to classify a target page
The results show that stemming-based text representation achieved better performance than hypernym-based text representation
Results show that the abilities of HNB performed better than other methods and Symmetrical Uncertainty (SU) was more competitive than ReliefF in web pages categorization
Summary
Webpage classification techniques use text in the page, the link structure, hyperlink structure or anchor text information to classify a target page. The addition of this enormous amount of data along with interactive and content rich nature of the web made it very popular These pages vary to a great extent in both the content and quality of information. Conceptual structures are defined on ontology, appropriate to the idea of machine process-able data on the semantic web. Ontologies are data schemas; provide a controlled vocabulary of concepts in which each includes an explicitly defined machine process-able semantics. Ontology learning methods require diverse techniques from different fields like knowledge acquisition, database management, naturallanguage processing, information retrieval, artificial intelligence and machine learning. This study identifies the web page Hypernyms (superordinate words) and clubbed it with entailment rule acquisition to classify the web documents
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Research Journal of Applied Sciences, Engineering and Technology
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.