Abstract

As we know that the deep web grows at very fast pace, there has been increased interest in techniques which help efficiently locate and check deep web interfaces. So, it is important to achieve wide coverage and high efficiency on the large volume of web resources. For this we propose a multistage framework, Smart crawler. Smart crawler is a two-stage crawler used to efficiently harvest deep web interfaces. In the first stage, the crawler performs site-based searching for center pages and avoids visiting non-relevant sites. In the second stage, an adaptive link ranking technique is used which helps to searching relevant site by excavating most relevant links. It is important to eliminate bias on visiting highly relevant links which is hidden in web directories, for this a link tree data structure is designed to achieve wider coverage for a website. The proposed framework gives experimental result on different domains and shows the agility and accuracy of the proposed framework, which retrieves deep-web interfaces from a large volume of sites and achieves higher harvest rates than other crawler.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call