A QIIIEP based domain specific hidden web crawler

D K Sharma,A K Sharma

doi:10.1145/1980022.1980073

Abstract

For context based surfing of World Wide Web in a systematic and automatic manner, a web crawler is required. The World Wide Web consists interlinked documents and resources that are easily crawled by general web crawler, known as surface web crawler. But for crawling the hidden web data, in which the data is hidden behind the html forms requires special type of crawler, known as hidden web crawler. For efficient crawling of hidden web data, the discovery of relevant and proper html forms is very important step. For this purpose a technique for domain specific hidden web crawler is proposed in this paper. The proposed technique is based on the domain specific crawling of World Wide Web. In this approach, a link is followed in a step by step manner, which results in a large source of hidden web databases. Experiential results verify that the proposed approach is quite effective in crawling the hidden web data contents.

Full Text