Abstract

Chinese Web page classification (WPC) has been considered as a hot research area in data mining. In order to effectively classify Web pages, we present a Web page categorization based on a least square support vector machine (LS-SVM) with latent semantic analysis (LSA). LSA uses singular value decomposition (SVD) to obtain latent semantic structure of original term-document matrix solving the polysemous and synonymous keywords problem. LS-SVM is an effective method for learning the classification knowledge from massive data, especially on condition of high cost in getting labeled classical examples. We adopt a novel method of Web page expression, and make use of summarization algorithm to reduce the noise of Web pages. A preliminary experimental comparison is made showing encouraging results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call