Abstract

HITS algorithm is a famous topic distillation algorithm, but it has a drawback of topic drift. To tackle this problem, a new improved HITS algorithm is proposed by assigning appropriate weights to links according to the link value and topic similarity. Based on an analysis of web link structure, link value is calculated by web page authority degree; topic similarity of web pages is calculated by combining analysis of page content with HTML structure characteristics. Improved HITS algorithm combining link value with topic similarity highlights the difference of links and it assigns different weights to different links. Experiment results indicate that the proposed HITS algorithm can improve the relevance ratio by 13%-42%. Furthermore it can well control topic drift and enhance the accuracy of information collection. The proposed HITS algorithm can be applied in vertical search engines. It lays an important theoretical foundation for vertical search engines.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.