WEIDJ: Development of a new algorithm for semi-structured web data extraction

Ily Amalina Ahmad Sabri,Mustafa Man

doi:10.12928/telkomnika.v19i1.16205

Ily Amalina Ahmad Sabri, Mustafa Man

Open Access

https://doi.org/10.12928/telkomnika.v19i1.16205

Copy DOI

Abstract

In the era of industrial digitalization, people are increasingly investing in solutions that allow their process for data collection, data analysis and performance improvement. In this paper, advancing web scale knowledge extraction and alignment by integrating few sources by exploring different methods of aggregation and attention is considered in order focusing on image information. The main aim of data extraction with regards to semi-structured data is to retrieve beneficial information from the web. The data from web also known as deep web is retrievable but it requires request through form submission because it cannot be performed by any search engines. As the HTML documents start to grow larger, it has been found that the process of data extraction has been plagued with lengthy processing time. In this research work, we propose an improved model namely wrapper extraction of image using document object model (DOM) and JavaScript object notation data (JSON) (WEIDJ) in response to the promising results of mining in a higher volume of image from a various type of format. To observe the efficiency of WEIDJ, we compare the performance of data extraction by different level of page extraction with VIBS, MDR, DEPTA and VIDE. It has yielded the best results in Precision with 100, Recall with 97.93103 and F-measure with 98.9547.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: TELKOMNIKA (Telecommunication Computing Electronics and Control)	Publication Date: Feb 1, 2021
Citations: 3	License type: cc-by

R Discovery Prime

R Discovery Prime

WEIDJ: Development of a new algorithm for semi-structured web data extraction

Abstract

Talk to us

Similar Papers

More From: TELKOMNIKA (Telecommunication Computing Electronics and Control)

Lead the way for us

Similar Papers

Web Data Extraction Approach for Deep Web using WEIDJ
Ily Amalina Ahmad Sabri ... Ahmad Nazari Mohd Rose
Procedia Computer Science | VOL. 163
Ily Amalina Ahmad Sabri, et. al.Ily Amalina Ahmad Sabri ... Ahmad Nazari Mohd Rose
01 Jan 2019
Procedia Computer Science | VOL. 163

Performance Analysis for Mining Images of Deep Web
Ily Amalina Ahmad Sabri ... Mustafa Man
International Journal of Advanced Computer Science and Applications | VOL. 11
Ily Amalina Ahmad Sabri, et. al.Ily Amalina Ahmad Sabri ... Mustafa Man
01 Jan 2020
International Journal of Advanced Computer Science and Applications | VOL. 11

Effective Web data extraction with standard XML technologies
Jussi Myllymaki
Computer Networks | VOL. 39
Jussi MyllymakiJussi Myllymaki
11 Apr 2002
Computer Networks | VOL. 39

Optimizing Data Extraction using Preprocessing for Enhanced Efficiency
Santosh V Chobe, Swati Nikam
Journal of Electrical Systems | VOL. 20
Santosh V Chobe, Swati NikamSantosh V Chobe, Swati Nikam
04 Apr 2024
Journal of Electrical Systems | VOL. 20

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

WEIDJ: Development of a new algorithm for semi-structured web data extraction

Abstract

Talk to us

Similar Papers

More From: TELKOMNIKA (Telecommunication Computing Electronics and Control)