Abstract

With the rapid increase in demand of digital information via internet, it becomes imperative for search engines to serve up to date information in response to user query. A web crawler plays a vital role in maintaining local cache of search engine. Today, the biggest challenge for web crawler is how to harness fresh information in its local cache of n constantly changing web pages. These web pages often possess dynamic creation and updation cycle. Moreover, the resources for downloading possible updates are limited. This problem is formulated as non-deterministic optimization problem. Many attempts had been made by researchers in past to solve it. But most of existing techniques work well for small value of n and often intractable for large corpus. The paper presents an optimal solution to deal with this non- deterministic problem for large data. The experimental results show that technique achieves promising results as compared to existing crawler.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call