Abstract

Web page clustering is a fundamental technique to offer a solution for data management, information locating and its interpretation of Web data and to facilitate users for navigation, discrimination and understanding. Most existing clustering algorithms can't adapt well to Web page clustering directly in terms of efficiency and effectiveness due to the problems of high dimensionality and data sparseness. Furthermore, the uncontrolled nature of web content presents additional challenges to web page clustering, whereas the interconnected characteristic of hypertext can provide useful information for the process. To address this problem, we propose a new Web page clustering method with combining neighbors' content to overcome data sparseness and using Iterative Feature Selection to remove noisy and redundant features and to improve the performance of clustering algorithm. Experimental results show that the proposed method significantly improves the performance of the Vietnamese web page clustering with a relatively small number of good descriptive features for web pages.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.