Abstract

In order to serve a diversified user base with a range of purposes, general search engines offer search results for a wide variety of topics and material categories on the internet. While Focused Crawlers (FC) deliver more specialized and targeted results inside particular domains or verticals, general search engines give a wider coverage of the web. For a vertical search engine, the performance of a focused crawler is extremely important, and several ways of improvement are applied. We propose an intelligent, focused crawler which uses Reinforcement Learning (RL) to prioritize the hyperlinks for long-term profit. Our implementation differs from other RL based works by encouraging learning at an early stage using a decaying ϵ-greedy policy to select the next link and hence enables the crawler to use the experience gained to improve its performance with more relevant pages. With an increase in the infertility rate all over the world, searching for information regarding the issues and details about artificial reproduction treatments available is in need by many people. Hence, we have considered infertility domain as a case study and collected web pages from scratch. We compare the performance of crawling tasks following ϵ-greedy and decaying ϵ-greedy policies. Experimental results show that crawlers following a decaying ϵ-greedy policy demonstrate better performance

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.