A Comparative Study of Ranking Techniquesfor Hidden Web and Surface Web

Jyoti Yadav ,Anil Kumar ,Suman Rani

doi:10.15680/ijircce.2015.0305134

Abstract

The web consist of Surface web and hidden web. Surface web is also known as publically indexable web. It can be accessed by search engines using hyperlinks present on the pages and using simple keyword matching schemes. Hidden web refers to content that is hidden behind HTML forms. This contains a large collection of data that are unreachable by link-based search engines. A study conducted at University of California, Berkeley estimated that the deep web consists of around 91,000 terabytes of data, whereas the surface web is only about 167 terabytes. The hidden and surface web crawlers return huge result set for the user query. But users commonly look at top ten or twenty results that can be seen without scrolling. Users rarely look at results coming after first response page so ranking of the results is needed. Till now ranking of the web data is a big challenge. Various scholars tried to propose better and efficient techniques for ranking. In this paper, various ranking methods for the hidden web as well as surface web will be explored

Full Text