Abstract

Pagerank was used in systems based on hyperlink structure such as Google. TFIDF was widely used in IR systems based on the vector space model (VSM). It was significative to combine the advantages of two systems. In this paper, we set up a new model by using the content of Web pages and the links among pages. We set up the transition probability matrix, which composed of link information and the relevant value of pages with the given query. The relevant value was denoted by TFIDF. We got the MixPR (mixed pagerank) by solving the equation with the coefficient of matrix. In this model, part of the pages, which would be used to compute the TFIDF, had been downloaded from the Internet firstly, and the link information which started from those pages was stored in local server, too. The importance of the page was determined by content and the links. Experimental results showed that the new model worked well, and the precision approached to the result of the TFIDF did.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call