A Survey on Content Based Crawling for Deep and Surface Web

Nishchay Agrawal,Suchi Johari

doi:10.1109/iciip47207.2019.8985906

Abstract

The World Wide Web contains massive source of content. Fetching of relevant information from the WWW is a very typical task. Web crawler plays an important role to fetch the relevant content from the WWW and for indexing the web pages. To accommodate drastically increasing user requests, an efficient and optimized crawler is required. Content of the surface web pages are available to all users directly for access, but content of the deep web is not exposed to the users. The crawling of the hidden web is even more difficult. Authors have proposed algorithms for different web crawlers for fetching the information from the surface and deep web in an efficient and optimized manner. In this paper, we have reviewed different web crawlers and have classified them based on the information fetched by them. This paper provides a comparative analysis of web crawlers used for fetching the information based on URL, deep and surface web.

Full Text