Intelligent and adaptive web data extraction system using convolutional and long short-term memory deep learning networks

Sudhir Kumar Patnaik,Mukul Bhave,C Narendra Babu

doi:10.26599/bdma.2021.9020012

Sudhir Kumar Patnaik, Mukul Bhave + Show 1 more

Open Access

https://doi.org/10.26599/bdma.2021.9020012

Copy DOI

Abstract

Data are crucial to the growth of e-commerce in today's world of highly demanding hyper-personalized consumer experiences, which are collected using advanced web scraping technologies. However, core data extraction engines fail because they cannot adapt to the dynamic changes in website content. This study investigates an intelligent and adaptive web data extraction system with convolutional and Long Short-Term Memory (LSTM) networks to enable automated web page detection using the You only look once (Yolo) algorithm and Tesseract LSTM to extract product details, which are detected as images from web pages. This state-of-the-art system does not need a core data extraction engine, and thus can adapt to dynamic changes in website layout. Experiments conducted on real-world retail cases demonstrate an image detection (precision) and character extraction accuracy (precision) of 97% and 99%, respectively. In addition, a mean average precision of 74%, with an input dataset of 45 objects or images, is obtained.

Highlights

Data are crucial to the growth of e-commerce in today’s world of highly demanding hyper-personalized consumer experiences, which are collected using advanced web scraping technologies
For each experiment the following figures are included: Input image with and without user login indicating an error/ no error condition; Output image with a bounding box user login indicating an error/no error condition, and demonstrating the automated image or object detection capability by using the deep learning based You only look once (Yolo) model; Output text demonstrating the work of the proposed web data extraction algorithm. 4.3.1 Experiments 1 and 2: Extracting data from single product specification in the Amazon retail site without and with changes in website layout
Data extracted from single product specification pages in the Amazon retail website are shown in Figs. 9 and 10

Summary

Introduction

Data are crucial to the growth of e-commerce in today’s world of highly demanding hyper-personalized consumer experiences, which are collected using advanced web scraping technologies. Recent advancements in machine learning and Artificial Intelligence (AI) have unfolded new opportunities, even in extensively studied research programs in numerous domains, including medical imaging (e.g., image recognition), transportation (feature extraction in selfdriving cars)[1,2], and traffic scenarios (e.g., object detection)[3,4] These advancements encourage the (pdf, doc, or txt files), websites, and images that use Optical Character Recognition (OCR)[5], subsequently inspiring the development of automated web data extraction systems through leading edge technology solutions[6,7]. Web data extraction is explored using repetitive blocks[11], with their respective attributes obtained from classification-based approaches This data extraction technique demonstrates good accuracy and adaptability to layout changes in websites. The application of CNN is typically used for accurate object detection[13], semantic segmentation[22] (using selective search algorithm to propose possible regions of interest)[16,23] and object classification

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Big Data Mining and Analytics	Publication Date: Dec 1, 2021
Citations: 25	License type: cc-by

R Discovery Prime

R Discovery Prime

Intelligent and adaptive web data extraction system using convolutional and long short-term memory deep learning networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Big Data Mining and Analytics

Lead the way for us

Similar Papers

Gujarati Task Oriented Dialogue Slot Tagging Using Deep Neural Network Models
Rachana Parikh ... Hiren Joshi
-
Rachana Parikh, et. al.Rachana Parikh ... Hiren Joshi
01 Jan 2020
01 Jan 2020

Intelligent phishing detection scheme using deep learning algorithms
Moruf Akin Adebowale ... Khin T Lwin
Journal of Enterprise Information Management | VOL. 36
Moruf Akin Adebowale, et. al.Moruf Akin Adebowale ... Khin T Lwin
04 Jun 2020
Journal of Enterprise Information Management | VOL. 36

DEEPFAKE DETECTION USING DEEP LEARNING (CNN+LSTM)
Mohd Salim Shaikh ... Rupesh Sharma
INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT | VOL. 07
Mohd Salim Shaikh, et. al.Mohd Salim Shaikh ... Rupesh Sharma
01 Nov 2023
INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT | VOL. 07

Deep Air Quality Forecasts: Suspended Particulate Matter Modeling With Convolutional Neural and Long Short-Term Memory Networks
Ekta Sharma ... Alfio V Parisi
IEEE Access | VOL. 8
Ekta Sharma, et. al.Ekta Sharma ... Alfio V Parisi
01 Jan 2020
IEEE Access | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Intelligent and adaptive web data extraction system using convolutional and long short-term memory deep learning networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Big Data Mining and Analytics