Abstract

Text categorization is used to assign each text document to predefined categories. This paper presents a new text classification method for classifying Chinese text based on Rocchio algorithm. We firstly use the TFIDF to extract document vectors from the training documents which have been correctly categorized, and then use those document vectors to generate codebooks as classification models using the LBG and Rocchio algorithm. The codebook is then used to categorize the target documents using vector scores. We tested this method in the experiment and the result shows that this method can achieve better performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.