Abstract

Hits algorithm has gotten great success and been applied in the analysis of web linking. Hits algorithm is used to search the authority pages and the hub pages from the results of the search engine, and it can also be used to search the web communities. But Hits algorithm is based on the hyperlinks of the pages, it is easy to bring the problem of topic excursion. Hits algorithm requires a number of pages as the basic-set for calculating and can not be used in plain texts. This paper introduces a new algorithm: PK-TDC which makes use of the iterative idea of Hits. PK-TDC searches the authority pages and keywords on the topology of pages-keywords, and clusters the pages by their including keywords. The experiment shows PK-TDC algorithm significantly performs in extracting the subjects and clustering not only in the pages with hyperlinks but also in the plain texts.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.