Abstract

Topic distillation on the Web, namely, finding quality information sources related to a given query topic with hyperlink analysis, has been shown to be useful in Web IR. Based on the analysis of three deficiencies of classical topic distillation algorithm HITS, this paper presents an improved model and algorithm named s-HITSc. Given a query topic, the improved algorithm can model a neighborhood graph at site granularity, compute the relevance weights of the nodes to the topic with content analysis, and apply weighted I/O operations in its iterative hyperlink analysis. Theoretical analysis and experimental results show that s-HITSc can control topic drift and identify more reasonable and meaningful authority and hub sites on a given topic.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call