Abstract

There are three core issues recognized for WAN-based distributed Web crawling systems: Web Partition, Agent collaboration and Agent deployment. Centering around these issues, this paper presents a comprehensive overview of the current strategies adopted by academic and business communities. The experiences, problems and challenges encountered by the WAN-based distributed Web crawlers are classified and discussed in depth. A summary of the current evaluation indicators is also given. Finally, conclusion and some suggestions for future research are put forward.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call