Web Scraping: From Tools to Related Legislation and Implementation Using Python

Harshit Nigam,Prantik Biswas

doi:10.1007/978-981-15-9651-3_13

Abstract

AbstractThe Internet is the largest database of information ever built by mankind. It contains a wide variety of self-explanatory substances obtainable in varied designs such as audio/video, text, and others. However, the poorly designed data that largely fills up the Internet is difficult to extract and hard to use in an automated process. Web scraping cuts this manual job of extracting information and organizing information and provides an easy-to-use way to collect data from the webpages, convert it into some desired format, and store it in some local repository. Owing to the vast scope of applications of Web scraping ranging from lead generation to reputation and brand monitoring, from sentiment analysis to data augmentation in machine learning, many organizations use various tools to extract useful data. This study deals with different Web scraping tools and libraries, categorized into (i) Partial tools, (ii) Libraries and frameworks, and (iii) complete tools that have been developed over the last few years and is extensively used to collect data and convert into structured data to be used for text-processing applications. This paper explores the terms Web scraping and Web crawling, categorizes the tools available in the current market, and enables the reader to make their Web scraper using one such tool. The paper also comments on the legality associated with Web scraping at the end. KeywordsWeb scrapingLegislationWeb data extractionScraping toolDOM tree

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Web Scraping: From Tools to Related Legislation and Implementation Using Python

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Sentiment analysis using web scraping for live news data with machine learning algorithms
Parneet Kaur
Materials Today: Proceedings | VOL. 65
Parneet KaurParneet Kaur
01 Jan 2021
Materials Today: Proceedings | VOL. 65

Web Scraping or Web Crawling: State of Art, Techniques, Approaches and Application
Moaiad Khder
International Journal of Advances in Soft Computing and its Applications | VOL. 13
Moaiad KhderMoaiad Khder
28 Nov 2021
International Journal of Advances in Soft Computing and its Applications | VOL. 13

Web Scraping Techniques and Applications: A Literature Review
Chaimaa Lotfi ... Swetha Srinivasan
-
Chaimaa Lotfi, et. al.Chaimaa Lotfi ... Swetha Srinivasan
01 Jan 2020
01 Jan 2020

Web Scraping for E-Commerce Websites
Atharva Bankar
INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT | VOL. 08
Atharva BankarAtharva Bankar
06 Apr 2024
INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT | VOL. 08

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Web Scraping: From Tools to Related Legislation and Implementation Using Python

Abstract

Talk to us

Similar Papers