Smart Information Retrieval: Domain Knowledge Centric Optimization Approach

Abduladem Aljamel,Taha Osman,Giovanni Acampora,Ziqi Zhang,Autilia Vitiello

doi:10.1109/access.2018.2885640

Abstract

In the age of the Internet of Things, online data have witnessed a significant growth in terms of volume and diversity, and research into information retrieval has become one of the important research themes in the Internet-oriented data science research. This paper introduces a novel domain knowledge centric methodology aimed at improving the accuracy of using machine learning methods for relation extraction from text data, which is critical to the accuracy and efficiency of information retrieval-based applications, including recommender systems and sentiment analysis. The proposed methodology makes a significant contribution to the processes of domain knowledge-based relation extraction including interrogating Linked Open Datasets to generate the relation classification training data, addressing the imbalanced classification in the training datasets, determining the probability threshold of the best learning algorithm, and establishing the optimum parameters for genetic algorithms, which were utilized to optimize the feature selection for the learning algorithms. The experimental evaluation of the proposed methodology reveals that the adopted machine-learning algorithms exhibit higher precision and recall in relation extraction in the reduced feature space optimized by our implementation. The considered machine learning includes support vector machine, perceptron algorithm uneven margin, and K-nearest neighbors. The outcome is verified by comparing against the random mutation hill-climbing optimization algorithm using Wilcoxon signed-rank statistical analysis.

Highlights

Internet of Things (IoT) paradigm is increasing the amount of data being made available online [1], [2]
We conclude that the three Machine Learning (ML) algorithms require approximately the same numbers of iterations to reach the optimal fitness value and that 100 iterations are quite sufficient for the Genetic Algorithms (GA) to achieve that goal
These results are consistent with the findings of Wang et al [41] who noted that the entity features lead to improvement in performance because the mentioned relation between two entities is closely related to the entity types

Summary

Introduction

Internet of Things (IoT) paradigm is increasing the amount of data being made available online [1], [2]. In our implementation of Machine Learning based relation classification, 1http://www.linkeddata.org domain-specific knowledge is used to compile some of our training datasets by drawing on relation mentions that feature as ground facts in public datasets such as DBpedia and Freebase. This alleviates the manual annotation effort for relation extraction, which can be a time-consuming and cumbersome task to undertake manually [13]. Examples of random search implementations include evolutionary algorithms, simulated annealing and random mutation hill-climbing

Objectives

Results

Conclusion