Evaluating and Improving the Robustness of Web Crawlers Against IP Blocking and Captchas

Victor Onyenagubom

doi:10.51583/ijltemas.2024.130417

Abstract

In this work, we demonstrate the capability of a JavaScript-based web crawler to overcome anti-crawling measures such as CAPTCHAs and IP blocking. we delved into the ethical and legal dimensions of web crawling and provide recommendations for future research endeavors in this domain. A web crawler, being an automated software program, can navigate through websites and extract information. While it serves purposes like website analysis and indexing, it can also be misused for extracting personal data, scraping content, and overloading servers. Website administrators often employ anti-crawling techniques, such as CAPTCHAs, Robot.txt, and IP blocking, to thwart malicious web crawlers from accessing their content. These techniques aim to curtail the ability of a web crawler to scrape, access, or overload website resources, without impeding legitimate users from accessing the necessary content. The objective of this study is to demonstrate and enhance the resilience of legitimate web crawlers against anti-crawling techniques like CAPTCHAs and IP blocking, challenging the notion that these measures are universally effective against all types of web crawlers.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Evaluating and Improving the Robustness of Web Crawlers Against IP Blocking and Captchas

Abstract

Talk to us

Similar Papers

More From: International Journal of Latest Technology in Engineering, Management & Applied Science

Lead the way for us

Similar Papers

Web Crawler : Review of Different Types of Web Crawler, Its Issues, Applications and Research Opportunities
...
international journal of advanced research in computer science | VOL. 8
, et. al. ...
01 Jan 2017
international journal of advanced research in computer science | VOL. 8

The Ethicality of Web Crawlers
Yang Sun ... Isaac G Councill
-
Yang Sun, et. al.Yang Sun ... Isaac G Councill
01 Aug 2010
01 Aug 2010

Distributed Fundamentals based Conducting the Web Crawling Approaches and Types (Focused, Incremental, Distributed, Parallel, Hidden Web, Form Focused and Breadth First) Crawlers
Aska Ezadeen Mehyadin ... Riyadh Qashi
Journal of Smart Internet of Things | VOL. 2022
Aska Ezadeen Mehyadin, et. al.Aska Ezadeen Mehyadin ... Riyadh Qashi
01 Dec 2022
Journal of Smart Internet of Things | VOL. 2022

Intelligent Distributed Web Crawler Based on Attention Mechanism
Yi Wu ... Yan Song
-
Yi Wu, et. al.Yi Wu ... Yan Song
17 Oct 2020
17 Oct 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Evaluating and Improving the Robustness of Web Crawlers Against IP Blocking and Captchas

Abstract

Talk to us

Similar Papers

More From: International Journal of Latest Technology in Engineering, Management &amp; Applied Science

More From: International Journal of Latest Technology in Engineering, Management & Applied Science