Abstract

Web data extraction has evolved over the years with extracting data from documents to today’s World Wide Web (WWW). The WWW growth has placed data at the centre of this ecosystem and benefited society at large, businesses and consumers. The proposed system uses deep learning technique, Faster region convolutional neural network (R-CNN) for automated navigation, extraction of data and self-healing of data extraction engine to adapt to dynamic changes in website layout. The proposed system trains the Faster R-CNN model for detection of product in the web page using bounding box image detection technique and extracts product details with high extraction accuracy. Deep learning technique has advanced rapidly in the different fields for image detection, but its application in data extraction makes this paper unique. An ecommerce retail website is used as real-world example to prove the self-healing capability of the proposed automated web data extraction system.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call