Abstract
Semantic web is not just a matter of translation from HTML to RDF/OWL languages. It is a matter of understanding the content of the web through knowledge graphs. Entities need to be related with relationships. This content is composed of resources (web pages) that contain, for example, text, images and audio. Thus, there is the need of extracting entities from these resources. Currently, most of the web content is in HTML5 format which is a W3C recommendation which enables describing the structure marginally with the help of annotations. The main challenge here is to transform unstructured data from plain HTML files to structured data (e.g RDF or OWL). The current work provides the first hand information for dealing with unstructured heterogeneous data residing on web using Twinkle, a Java tool for SPARQL query execution on FOAF (Friend Of A Friend) document.
Highlights
1.1 Current State of WebThe current state of the web is highly unstructured and consists of vast repository of interconnected documents which are presented to end users as a collection of huge inter-linked documents
The current state of the web is mature enough owing to the new technologies such as XML, Ontology, SPARQL etc. to name a few which strive to ingest some sort of structuredness and semantics to the otherwise unstructured and heterogeneous web
SPARQL plays a key role in executing queries against heterogeneous data sources employing its native RDF format or which is transformed into RDF format by some middleware application
Summary
The current state of the web is highly unstructured and consists of vast repository of interconnected documents which are presented to end users as a collection of huge inter-linked documents. The persistence of documents cannot be uniformly guaranteed. HTML’s simplicity comes at a cost of interoperability which implies HTML documents are human readable but extensive ground work is desirable to make them machine readable and inter-operable by different software’s. This is how XML emerged adding structuredness to unstructured HTML data in the form of DTD and Schema. The current state of the web is mature enough owing to the new technologies such as XML, Ontology, SPARQL etc. The current state of the web is mature enough owing to the new technologies such as XML, Ontology, SPARQL etc. to name a few which strive to ingest some sort of structuredness and semantics to the otherwise unstructured and heterogeneous web
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Engineering and Advanced Technology
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.