A review on extracting underlying content from deep web interfaces

Unnati N Bhakare,Prashant N Chatur

doi:10.1109/icimia.2017.7975609

Abstract

Information retrieval and integration of web data is recent trend in today's world of technology. Huge amount of data is available in online repositories but most of it is hidden under deep web interfaces. As deep web is growing at a very fast rate it is becoming difficult to efficiently locate the deep-web interfaces and retrieving the required data. The large volume of web resources and the dynamic nature of deep web are available. This provides the wide coverage and efficient information availability. And thus pose a challenging issue in the sector of information retrieval. The rapid growth of the World-Wide Web poses extraordinary scaling tasks for general-purpose crawlers and search engines. Current-day crawlers recover content only from the openly indexed web, i.e., the set of web pages that are directly or easily reachable by hypertext links, without considering search forms and pages with prerequisites like authorization or earlier registration. This paper reviews the methodology of content extraction form deep web interface.

Full Text