Ranked Deep Web Page Detection Using Reinforcement Learning and Query Optimization

Kapil Madan,Rajesh K Bhatia

doi:10.4018/ijswis.2021100106

Abstract

This paper proposes a novel algorithm based on reinforcement learning-entitled asynchronous advantage actor-critic (A3C). Overflow queries are optimized to crawl the ranked deep web. A3C assigns the reward and penalty to the various queries. Queries are derived from the domain-based taxonomy that helps to fill the search forms. Overflow queries are the collection of queries that match with more than k number of results and only top k matched results are retrieved. Low ranked documents beyond k results are not accessible and lead to low coverage. Overflow queries are optimized to convert into non-overflow queries based on the proposed technique and lead to more coverage. As of yet, no research work has been explored by using A3C with taxonomy in the domain of ranked deep web. The experimental results show that the proposed technique outperforms the three other techniques (i.e., document frequency, random query, and high frequency) in terms of average improvement metric by 26%, 69%, and 92%, respectively.

Full Text