Abstract

In the Semantic Web community many approaches have been evolved for generating RDF (Resource Description Framework) resources. However, they often capture duplicate resources, that are stored without elimination. In consequence, duplicate resources reduce the data quality as well as increase unnecessary size of the dataset. We propose an approach for detecting duplicate resources in RDF datasets using Hadoop and MapReduce framework. RDF resources are compared using similarity metrics defined at resource level, RDF statement level as well as object level. The performance is evaluated with the evaluation metrics and the experimental evaluation showed the accuracy, effectiveness, and efficiency of the proposed approach.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call