Abstract

HTML pages contain unstructured and diverse information. However, these documents lack semantics and are not machine understandable. Semantic webs aim to add formal semantics to web data, whereas ontology provides formal semantics to a domain and is thus considered a foundation of semantic webs. Domain ontologies can be constructed manually, but this process is tedious and inefficient. Thus, this study presents an ontology learning (OL) model to create domain ontologies automatically from a set of HTML pages. The key insight of this research is that it combines the list structure and headings of HTML pages to recognize the ontology vocabulary. The approach also incorporates synonym relationships with ontology and allows the semantic interpretation of ontology concepts. We implement the proposed OL approach to build sports ontology from a collection of sports domain HTML documents. The new sports ontology is tested using FaCT++ reasoner; results show no inconsistency in the ontology. Furthermore, experts evaluate the successful mapping of HTML lists and headings to the ontology vocabulary. The proposed OL approach performs effectively and achieves 92.7% and 95.4% precision values for list and heading mapping, respectively.

Highlights

  • HTML is a markup language that is used to write web pages over the World Wide Web [1]

  • We evaluated our approach by using the sports domain dataset, which consists of 105 HTML documents collected from https://www.sports.ru website1

  • We initially evaluated the new ontology learned by our ontology learning (OL) model by using a semantic reasoner

Read more

Summary

INTRODUCTION

HTML is a markup language that is used to write web pages over the World Wide Web [1] It consists of elements called tags, which have a fixed definition. Web browsers are tools that interpret these tags and display the web pages Many web applications, such as data mining, machine learning, artificial intelligence, and natural language processing, facilitate the retrieval of information from web pages to fulfill user information requirements [2,3,4]. The vision of semantic webs is to achieve HTML documents that are understandable by machines To achieve this vision, a formal manner of representing semantics is required. Ontology has emerged as an approach that represents the machine-understandable semantics of a domain and is currently considered the heart of semantic web technologies [5].

Textual-based OL Techniques
Knowledge-based OL Techniques
Semistructured-based Techniques
List Extractor
Heading Extractor
Hierarchy Identification
List and Heading Merger
Add Synonyms
IMPLEMENTATION AND RESULTS
Evaluation Measures
Result Analysis
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.