Abstract

Chinese text classification is an important task in data mining, which extracts category features from unstructured contents. Conventional Chinese text classification models only leverage the surface features in the original text, which omits the potential extensional knowledge of each word. To capture the semantic features of each word more comprehensive, this paper proposed a Chinese news text classification algorithm based on an online knowledge extension and convolutional neural network (OKE-CNN), which leverages both knowledge graph to extend latent semantic information and CNN to obtain the category. Compared with other baseline methods, OKE-CNN can utilize the surface and latent features, simultaneously, which can be adapted to complex scenes, e.g., sparse data and unclear topics. In our experiment, OKE-CNN exhibits superior performance and achieves 97.94% and 87.03% on THUCNews and TouTiao datasets, separately, over SOTA competitors.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.