Web Scraping or Web Crawling: State of Art, Techniques, Approaches and Application

Moaiad Khder

doi:10.15849/ijasca.211128.11

Abstract

Web scraping or web crawling refers to the procedure of automatic extraction of data from websites using software. It is a process that is particularly important in fields such as Business Intelligence in the modern age. Web scrapping is a technology that allow us to extract structured data from text such as HTML. Web scrapping is extremely useful in situations where data isn’t provided in machine readable format such as JSON or XML. The use of web scrapping to gather data allows us to gather prices in near real time from retail store sites and provide further details, web scrapping can also be used to gather intelligence of illicit businesses such as drug marketplaces in the darknet to provide law enforcement and researchers valuable data such as drug prices and varieties that would be unavailable with conventional methods. It has been found that using a web scraping program would yield data that is far more thorough, accurate, and consistent than manual entry. Based on the result it has been concluded that Web scraping is a highly useful tool in the information age, and an essential one in the modern fields. Multiple technologies are required to implement web scrapping properly such as spidering and pattern matching which are discussed. This paper is looking into what web scraping is, how it works, web scraping stages, technologies, how it relates to Business Intelligence, artificial intelligence, data science, big data, cyber securityو how it can be done with the Python language, some of the main benefits of web scraping, and what the future of web scraping may look like, and a special degree of emphasis is placed on highlighting the ethical and legal issues. Keywords: Web Scraping, Web Crawling, Python Language, Business Intelligence, Data Science, Artificial Intelligence, Big Data, Cloud Computing, Cybersecurity, legal, ethical.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Web Scraping or Web Crawling: State of Art, Techniques, Approaches and Application

Abstract

Talk to us

Similar Papers

More From: International Journal of Advances in Soft Computing and its Applications

Lead the way for us

Journal: International Journal of Advances in Soft Computing and its Applications	Publication Date: Nov 28, 2021
Citations: 43

Similar Papers

Web scraping with excel
Manikandaprabhu T
International Scientific Journal of Engineering and Management | VOL. 02
Manikandaprabhu TManikandaprabhu T
16 Apr 2023
International Scientific Journal of Engineering and Management | VOL. 02

Web Scraping in the Statistics and Data Science Curriculum: Challenges and Opportunities
Mine Dogucu ... Mine Çetinkaya-Rundel
Journal of Statistics Education | VOL. 29
Mine Dogucu, et. al.Mine Dogucu ... Mine Çetinkaya-Rundel
04 Aug 2020
Journal of Statistics Education | VOL. 29

Data Science in Healthcare: Implications for Early Career Investigators.
Sanjeev P Bhavnani ... Daniel Muñoz
Circulation Cardiovascular Quality and Outcomes | VOL. 9
Sanjeev P Bhavnani, et. al.Sanjeev P Bhavnani ... Daniel Muñoz
01 Nov 2016
Circulation Cardiovascular Quality and Outcomes | VOL. 9

Data Science as an Enabler: Integrating Business Intelligence (BI) Tools with Artificial Intelligence (AI) for an Ever Evolving Industry
Ali Al-Jumah ... Mahamood Rawahi
-
Ali Al-Jumah, et. al.Ali Al-Jumah ... Mahamood Rawahi
22 Apr 2024
22 Apr 2024

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Web Scraping or Web Crawling: State of Art, Techniques, Approaches and Application

Abstract

Talk to us

Similar Papers

More From: International Journal of Advances in Soft Computing and its Applications