Abstract

In the context of RDF document matching/integration, the datatype information, which is related to literal objects, is an important aspect to be analyzed in order to better determine similar RDF documents. In this paper, we present an RDF Datatype in Ferring Framework, called RDF-F, which provides two independent datatype inference processes: 1) a four-step process consisting of (i) a predicate information analysis (i.e., deduce the datatype from existing range property), (ii) an analysis of the object value itself by a pattern-matching process (i.e., recognize the object lexical space), (iii) a semantic analysis of the predicate name and its context, and (iv) generalization of Numeric and Binary datatypes to ensure the integration; and 2) a non-ambiguous lexical-space-matching process, where literal values are inferred by the modification of their representation, following new lexical spaces. We evaluated the performance and the accuracy of both processes with datasets from DBpedia. Results show that the execution time of both indicators is linear and their accuracy can increase up to 97.10 and 99.30%, respectively.

Highlights

  • One of the main benefits offered by the Semantic Web initiative is the increased support of data sharing and the description of real resources on the Web, by defining standard data representation models such as RDF, the Resource Description Framework

  • We present an RDF Datatype in Ferring Framework, called RDF-F, which provides two independent datatype inference processes: 1) a fourstep process consisting of (i) a predicate information analysis, (ii) an analysis of the object value itself by a pattern-matching process, (iii) a semantic analysis of the predicate name and its context, and (iv) generalization of Numeric and Binary datatypes to ensure the integration; and 2) a non-ambiguous lexical-space-matching process, where literal values are inferred by the modification of their representation, following new lexical spaces

  • The non-ambiguous lexicalspace-matching process is behaving better than the fourstep inference process in accuracy (99.30% of F-score) and performance (11.955 s), but it demands the modification of engines that manage RDF data as triples, as well as the modification of the RDF data itself to support the new lexical space representations

Read more

Summary

Introduction

One of the main benefits offered by the Semantic Web initiative is the increased support of data sharing and the description of real resources on the Web, by defining standard data representation models such as RDF, the Resource Description Framework. Many efforts focus on describing the similarity between concepts, properties, and RDF describes resources as triples: hsubject; predicate; objecti, where subjects, predicates, and objects are all resources identified by IRIs.. Objects can be literals (e.g., a number, a string), which can be annotated with optional type information, called datatype. This latter is a classification of data, which defines types of RDF, adopted from XML Schema [25]. Simple datatypes can be primitive (e.g., boolean, float), derived (e.g., long, int derived from decimal), or user defined, which are built from primitive and derived datatypes by constraining some of its properties (e.g., range, precision, length, format). Complex datatypes contain elements defined as either simple or complex datatypes

Methods
Findings
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.