Abstract

The blend of digital and physical worlds changed the Internet significantly. Accordingly, trends to collect, access, and deliver information have changed over the Web. Such changes raised the problems of information retrieval. Search engines retrieve requested information based on the provided keywords which is not an efficient way for rich information retrieval. Consequently, the fetching of the required information is difficult without understanding the syntax and semantics of the content. The multiple existing approaches to resolve this problem by exploiting linked data and semantic Web techniques. Such approaches serialize the content leveraging the Resource Description Framework (RDF) and process the queries using SPARQL to resolve the problem. However, an exact match between RDF content and query structure is required. Although it improves the keyword-based search, it does not provide probabilistic reasoning to find the relationship accuracy between the query and results. In this perspective, this paper proposes a machine learning (random forest) based approach to predict the fetching status of RDF by treating RDFs’ requests as a classification problem. First, we preprocess the RDF to convert them into N-Triples format. Then, a feature vector is constructed for each RDF using the preprocessed RDF. After that, a random forest classifier is trained for the prediction of the fetching status of RDFs. The proposed approach is evaluated on an open-source DBpedia dataset. The 10-fold cross-validation results indicate that the performance of the proposed approach is accurate and surpasses the state-of-the-art.

Highlights

  • The digital age arrives with a set of challenges for Web because of the abundance of information

  • On the other hand, such approaches respond to queries with an exact match rather than estimating the similarity within the Resource Description Framework (RDF) content that motivates us for an automatic solution for fetching status prediction

  • We proposed a machine learning based approach (RFSearch) for fetching status prediction of RDFs

Read more

Summary

INTRODUCTION

The digital age arrives with a set of challenges for Web because of the abundance of information. The growth of data is rapid and resulting in information overload Searching such data guided the development of semantic Web, linked data, and Web applications. It improves the flow of information using machine-processable metadata [1] and can link data from distributed data sources to make data meaningful. Linked data integrated entities from different sources and yield a way to crawl them as a data space due to its connected links [2], [3] This idea is fundamental to this work to access the required information from multiple resources and integrate it for efficient searching.

Soliman
BACKGROUND
RDF INTERPRETATION Definition
PREPROCESSING
TRAINING AND PREDICTION
RESEARCH QUESTIONS
RELATED WORK
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call