Abstract
This paper presents a new algorithm of Web page classification, CUCS(Combined UC and SVM), for large training set. CUCS combines the advantages of SVM (Support Vector Machine) and UC (Unsupervised Clustering), achieving high precision and fast speed. In the training stage, CUCS gets clustering centers, which include positive example centers and negative ones, by means of UC. Then CUCS prunes training set to produce classifier by SVM. In the classifying stage, the minimum distance from a Web page to the positive centers, as well as to the negative centers, is calculated. If the difference between the two distances is large enough, the Web page will be classified by UC. Otherwise, the Web page will be classified by pruned SVM. Through experiments, CUCS manifests precision that is much higher than UC and a little higher than SVM. As to time consumed, CUCS costs more time than UC and far less than SVM.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.