Abstract

In information technology, an ontology is a knowledge structure consisting of the definitions and relations of information within one or even multiple domains. This semantically represented information is helpful for tasks such as document classification and item recommendation in recommender systems. However, as big data prevails, manually extending existing ontologies with up-to-date terminologies becomes challenging due to the tedious and time-consuming process and the expensive cost of expert manual labor. This study aims to achieve a fully automatic ontology extension.We propose a novel "Direct" approach for extending an existing Computer Science Ontology (CSO). This approach consists of two steps: initially extending the CSO with new topics and using this extended graph to obtain the new topic’s node embeddings as inputs for training classifiers. However, this initial extension still contains many noisy links; therefore, the classifier later acts as a filter and a link predictor. We experiment with various traditional machine learning and recent deep learning models and then compare them using our Direct approach. We also propose two evaluation procedures to decide the best-performing model and approach: the novel Wikipedia-based F1w score and the total number of resulting links. Furthermore, manual evaluation by four human experts is conducted to conclude the reliability of our proposed approach and evaluation procedure. This study concludes that the Direct approach’s Gaussian Naive Bayes model produces the most valid and reliable links, and we, therefore, use it to further extend the CSO with hundreds of new CS topics and links.

Highlights

  • Organizing information is becoming more crucial due to the rapid growth of information

  • We propose a “direct" automatic ontology extension approach

  • The goal of extension in this paper is aimed to the Computer Science Ontology (CSO) which does not have the traditional format of a taxonomy

Read more

Summary

Introduction

Organizing information is becoming more crucial due to the rapid growth of information. Proper information organization allows for easier access to certain information from a vast set of information. Ontology is one of the popular data structures used to store information efficiently; it is used to store concepts and the relationships among them. A major problem here is the maintenance of the ontology itself, that is, we need to keep ontologies up-to-date while minimizing human intervention. Maintenance means updating an existing ontology with new information. This includes the action of adding new topics and relationships to an existing ontology. Several methodologies exist for ontology maintenance, which often times include corpus annotation, training of information extraction engine, and validation by human experts

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call