A Fast Distributed Focused-web Crawling

Harry T Yani Achsan,Wahyu Catur Wibowo

doi:10.1016/j.proeng.2014.03.017

Harry T Yani Achsan, Wahyu Catur Wibowo

Open Access

https://doi.org/10.1016/j.proeng.2014.03.017

Copy DOI

Abstract

Mining data from a web database becomes more challenging in recent years due to the exploding size of data, the rising of dynamic web, and the increasing performance of web security. Mining data from a web database differs from mining data from web sites because it is intended to collect specific data from a single web site. Collecting a very large data in a limited time tends to be detected as a cyber attack and will be banned from connecting into the web server. To avoid the problem, this paper proposes a crawling method to mine web database faster and cheaper than conventional web crawlers. The method used is to run hundreds of threads from a single web crawler in a single computer and to distribute the threads into hundreds or thousands publicly available proxy servers. This web crawler strategy highly increases the speed of mining and is more secure than using single thread of web crawler.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Procedia Engineering	Publication Date: Jan 1, 2014
Citations: 54	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

A Fast Distributed Focused-web Crawling

Abstract

Talk to us

Similar Papers

More From: Procedia Engineering

Lead the way for us

Similar Papers

Predicting Vulnerabilities in Web Applications Based on Website Security Model
Ivan Kovacevic ... Mihael Marovic
-
Ivan Kovacevic, et. al.Ivan Kovacevic ... Mihael Marovic
22 Sep 2022
22 Sep 2022

Feature evaluation for web crawler detection with data mining techniques
Dusan Stevanovic ... Natalija Vlajic
Expert Systems with Applications | VOL. 39
Dusan Stevanovic, et. al.Dusan Stevanovic ... Natalija Vlajic
07 Feb 2012
Expert Systems with Applications | VOL. 39

빅데이터 분석 서비스 지원을 위한 지능형 웹 크롤러
Dongmin Seo ... Hanmin Jung
The Journal of the Korea Contents Association | VOL. 13
Dongmin Seo, et. al.Dongmin Seo ... Hanmin Jung
28 Dec 2013
The Journal of the Korea Contents Association | VOL. 13

Web Services Oriented Transactons Using Partial Dependecies
R Logothetis ... Jinli Cao
-
R Logothetis, et. al.R Logothetis ... Jinli Cao
05 Dec 2005
05 Dec 2005

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Fast Distributed Focused-web Crawling

Abstract

Talk to us

Similar Papers

More From: Procedia Engineering