Adaptive encoding-based evolutionary approach for Chinese document clustering

Jun-Xian Chen,Yue-Jiao Gong,Xiaolin Xiao,Wei-Neng Chen

doi:10.1007/s40747-022-00934-z

Jun-Xian Chen, Yue-Jiao Gong + Show 2 more

Open Access

https://doi.org/10.1007/s40747-022-00934-z

Copy DOI

Abstract

Document clustering has long been an important research direction in intelligent system. When being applied to process Chinese documents, new challenges were posted since it is infeasible to directly split the Chinese documents using the whitespace character. Moreover, many Chinese document clustering algorithms require prior knowledge of the cluster number, which is impractical to know in real-world applications. Considering these problems, we propose a general Chinese document clustering framework, where the main clustering task is fulfilled with an adaptive encoding-based evolutionary approach. Specifically, the adaptive encoding scheme is proposed to automatically learn the cluster number, and novel crossover and mutation operators are designed to fit this scheme. In addition, a single step of K-means is incorporated to conduct a joint global and local search, enhancing the overall exploitation ability. The experiments on benchmark datasets demonstrate the superiority of the proposed method in both the efficiency and the clustering precision.

Full Text