Efficient intelligent crawler for hamming distance based on prioritization of web documents

Amol Subhash Dange,Manju More Eshwar Rao,Asha Kethaganahalli Hanumanthaiah,Manjunath Swamy Byranahalli Eraiah,Sunil Kumar Ganganayaka

doi:10.11591/ijece.v14i2.pp1948-1958

Amol Subhash Dange, Manju More Eshwar Rao + Show 3 more

Open Access

https://doi.org/10.11591/ijece.v14i2.pp1948-1958

Copy DOI

Abstract

Search engines play a crucial role in today's Internet landscape, especially with the exponential increase in data storage. Ranking models are used in search engines to locate relevant pages and rank them in decreasing order of relevance. They are an integral component of a search engine. The offline gathering of the document is crucial for providing the user with more accurate and pertinent findings. With the web’s ongoing expansions, the number of documents that need to be crawled has grown enormously. It is crucial to wisely prioritize the documents that need to be crawled in each iteration for any academic or mid-level organization because the resources for continuous crawling are fixed. The advantages of prioritization are implemented by algorithms designed to operate with the existing crawling pipeline. To avoid becoming the bottleneck in pipeline, these algorithms must be fast and efficient. A highly efficient and intelligent web crawler has been developed, which employs the hamming distance method for prioritizing the pages to be downloaded in each iteration. This cutting-edge search engine is specifically designed to make the crawling process more streamlined and effective. When compared with other existing methods, the implemented hamming distance method achieves a high value of 99.8% accuracy.

Full Text