An architecture for extracting information from hidden web databases using intelligent agent technology through reinforcement learning

Lohit Singh,Dilip Kumar Sharma

doi:10.1109/cict.2013.6558108

Abstract

The web contains enormous amount of information. From that enormous information only small amount of that information is visible to users and a huge portion of the information is not visible to the users. This is because traditional search engines are not able to index or access all information. The information which can be retrieved by following hypertext links are accessed by such traditional search engines. The forms which are not accessed by traditional search engines include login or authorization process. Hidden web refers to that part of the web which is not accessed by traditional web crawlers. An important problem of retrieving desired and good quality of information from huge hidden web database is how to find out and identify the entry points of hidden web databases i.e., forms, in the Web. The traditional web crawlers may be unable to retrieve all information from deep web databases. Therefore it is the main cause of motivation for retrieving information from deep web. Issues and challenges related to the problem are also discussed. An architecture for accessing hidden web databases that uses an intelligent agent technology through reinforcement learning is proposed. The experimental results show that the reinforcement learning helps in overcoming existing problems and outperforms the existing hidden web crawlers in terms of precision and recall.

Full Text