Abstract
By means of word-segmentation technology in TRIP database and each word that appears in a database will be account in detail, a kind of self-constructed category dictionary (SCC-dictionary) in Chinese text classification is proposed. For solving high dimension and sparseness problem exit in vector space model, a four-dimensional feature vector space model (FFVSM) is presented in this paper. With Support Vector Machine (SVM) algorithm, the text classifier is designed. Experimental results show there are two achievements in this paper: first, SCC-dictionary can replace the artificial-written dictionary with the same effect; second, the FFVSM will not only reduce the computing load than high-dimensional feature vector space model, but also keep the precision of classification as 86.87%, recall rate as 95.12%, and F1 value as 90.81%.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.