An Efficient Approach to Extract and Store Big Semantic Web Data Using Hadoop and Apache Spark GraphX

Wria Mohammed Salih Mohammed,Alaa Khalil Ju Maa

doi:10.14201/adcaij.31506

Abstract

The volume of data is growing at an astonishingly high speed. Traditional techniques for storing and processing data, such as relational and centralized databases, have become inefficient and time-consuming. Linked data and the Semantic Web make internet data machine-readable. Because of the increasing volume of linked data and Semantic Web data, storing and working with them using traditional approaches is not enough, and this causes limited hardware resources. To solve this problem, storing datasets using distributed and clustered methods is essential. Hadoop can store datasets because it can use many hard disks for distributed data clustering; Apache Spark can be used for parallel data processing more efficiently than Hadoop MapReduce because Spark uses memory instead of the hard disk. Semantic Web data has been stored and processed in this paper using Apache Spark GraphX and the Hadoop Distributed File System (HDFS). Spark's in-memory processing and distributed computing enable efficient data analysis of massive datasets stored in HDFS. Spark GraphX allows graph-based semantic web data processing. The fundamental objective of this work is to provide a way for efficiently combining Semantic Web and big data technologies to utilize their combined strengths in data analysis and processing. First, the proposed approach uses the SPARQL query language to extract Semantic Web data from DBpedia datasets. DBpedia is a hugely available Semantic Web dataset built on Wikipedia. Secondly, the extracted Semantic Web data was converted to the GraphX data format; vertices and edges files were generated. The conversion process is implemented using Apache Spark GraphX. Third, both vertices and edge tables are stored in HDFS and are available for visualization and analysis operations. Furthermore, the proposed techniques improve the data storage efficiency by reducing the amount of storage space by half when converting from Semantic Web Data to a GraphX file, meaning the RDF size is around 133.8 and GraphX is 75.3. Adopting parallel data processing provided by Apache Spark in the proposed technique reduces the required data processing and analysis time. This article concludes that Apache Spark GraphX can enhance Semantic Web and Big Data technologies. We minimize data size and processing time by converting Semantic Web data to GraphX format, enabling efficient data management and seamless integration.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An Efficient Approach to Extract and Store Big Semantic Web Data Using Hadoop and Apache Spark GraphX

Abstract

Talk to us

Similar Papers

More From: ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal

Lead the way for us

Journal: ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal	Publication Date: Jun 5, 2024
License type: CC BY-NC-ND 4.0

Similar Papers

Mining Association Rules from Semantic Web Data without User Intervention
...
-
, et. al. ...
01 Jan 2020
01 Jan 2020

SWARM: An Approach for Mining Semantic Association Rules from Semantic Web Data
Molood Barati ... Qing Liu
-
Molood Barati, et. al.Molood Barati ... Qing Liu
01 Jan 2015
01 Jan 2015

A Semantics-Based, End-User-Centered Information Visualization Process for Semantic Web Data
Martin Voigt ... Klaus Meißner
-
Martin Voigt, et. al.Martin Voigt ... Klaus Meißner
01 Jan 2013
01 Jan 2013

Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce
Mohammad Farhan Husain ... Pankil Doshi
-
Mohammad Farhan Husain, et. al.Mohammad Farhan Husain ... Pankil Doshi
01 Jan 2009
01 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Efficient Approach to Extract and Store Big Semantic Web Data Using Hadoop and Apache Spark GraphX

Abstract

Talk to us

Similar Papers

More From: ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal