Abstract
HITS algorithm assigns same weight to links between Web pages,which results in topic drift. In this paper,a new focused crawling approach based on PSO Algorithm is proposed(PSOHITS). The method electively seeks out pages that are relevant to a pre-defined set of topics using PSO Algorithm,increases the crawling chance of the web page following the web page with the low content-relevance,and broadens the relevant-searching scope of crawlers.Meanwhile,the hyperlink metadata is used to predict the topic-relevance of the web page pointed and quickens the information crawling. Experiments show that the proposed algorithm can improve relevance ratio by 15%~36%.Furthermore,it can well avoid topic drift and improve the accuracy of information collection. It has important theoretical and practical values for search engines research.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.