Abstract

Hyperlinks inside HTML pages contain a wealth of information about the relationships among Web pages. Given a set of Web pages, we can explore the hyperlink relationships among these pages. This paper first provides formal definitions of hyperlink relations. We then use the notations to define similarity between two Web pages and between two sets of Web pages. For each one of them, we provide several definitions of similarity using forward and backward links. The similarity measure gives us a number between 0 and 1. We also demonstrate how to use the similarity measure to study clustering within a set of pages and to determine the diversity of a set of Web pages.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call