Exploiting Location-aware Mechanism for Distributed Web Crawling over DHTs

Xiao Xu,Weizhe Zhang,Hongli Zhang,Binxing Fang

doi:10.4304/jcp.5.11.1646-1654

Abstract

Inspired by the concept of internet computing, DHT-based distributed Web crawling model is proposed to solve the bottlenecks of the traditional Web crawling systems. Based on this system model, we propose optimizations to reduce the download time of the Web crawling tasks in order to increase the efficiency of the system. The improvement on the download time is achieved by shortening the crawler-crawlee network distance. By utilizing the mapping mechanism of Content Addressable Network (CAN) over Network Coordinate System (NC), the issue can be mapped onto a problem of minimizing the distances between peers and resources on the DHT overlay. This paper focuses on reducing such distances, seeking to provide an improved location-aware infrastructure for distributed Web crawling. A new DHT-based distributed Web crawling model is proposed first. Then, under this model, a new method based on CAN’s splitting schemes is proposed which shows a significant decrease in crawler-crawlee distance against existing schemes. In addition, the issue of load balancing is also solved by combining the new method with old ones.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Exploiting Location-aware Mechanism for Distributed Web Crawling over DHTs

Abstract

Talk to us

Similar Papers

More From: Journal of Computers

Lead the way for us

Similar Papers

Improving Prediction Accuracy of Matrix Factorization Based Network Coordinate Systems
Yang Chen ... Tianyin Xu
-
Yang Chen, et. al.Yang Chen ... Tianyin Xu
01 Aug 2010
01 Aug 2010

Phoenix: A Weight-Based Network Coordinate System Using Matrix Factorization
Yang Chen ... Xing Li
IEEE Transactions on Network and Service Management | VOL. 8
Yang Chen, et. al.Yang Chen ... Xing Li
01 Dec 2011
IEEE Transactions on Network and Service Management | VOL. 8

KoNKS
Eric Chan-Tin ... Nicholas Hopper
-
Eric Chan-Tin, et. al.Eric Chan-Tin ... Nicholas Hopper
02 May 2012
02 May 2012

Phoenix: Towards an Accurate, Practical and Decentralized Network Coordinate System
Yang Chen ... Xiaoxiao Song
-
Yang Chen, et. al.Yang Chen ... Xiaoxiao Song
01 Jan 2009
01 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploiting Location-aware Mechanism for Distributed Web Crawling over DHTs

Abstract

Talk to us

Similar Papers

More From: Journal of Computers