Abstract

Enriching instances into an ontology is an important task because the process extends knowledge in ontology to cover more extensively the domain of interest, so that greater benefits can be obtained. There are many techniques to classify instances of concepts with two popular techniques being the statistical and data mining methods. The paper compares the use of the two methods to classify instances to enrich ontology having greater domain knowledge, and selects a conditional random field for the statistical method and feature-weight k-nearest neighbor classification for the data mining method. The experiments are conducted on tourism ontology. The results show that conditional random fields methods provide greater precision and recall value than the other, specifically, F1-measure is 74.09% for conditional random fields and 60.04% for feature-weight k-nearest neighbor classification.

Highlights

  • Ontology consists of concepts in a domain-ofinterest, such as tourism, medicine, and agriculture

  • The ontology can be implemented in various domains, which are referred to systems and subs-systems that require in-depth meaning of the information, for example, information retrieval and recommendation systems

  • This paper identifies the boundary of NE and classifies types of NE by recognition technique Conditional Random Fields (CRFs) and feature-weight kNN classification, a supervised learning that learns from class-labelled examples

Read more

Summary

INTRODUCTION

Ontology consists of concepts in a domain-ofinterest, such as tourism, medicine, and agriculture. Two popular ones are the statistics and data mining methods (classification) This paper compares these two techniques to classify instances, that is, Conditional Random Fields (CRFs) for the statistics methods and feature-weight k-Nearest Neighbor (KNN) classification of data mining methods for extracting ontology instances (Imsombut and Sirikayon 2016; Imsombut and Paireekreng 2016). KNN, one of many classification techniques in data mining methods, is selected in this paper because the features of data are normally nominal and boolean data types. This feature contains words that usually stay around the interested words.

RELATED WORKS
Statistical Techniques
Data Mining Techniques
Benefits and Limitations of each Method
DATA AND EXPERIMENTS
RESULTS AND DISCUSSION
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call