Abstract

In traditional user portrait construction methods, static word vectors can extract only shallow semantic representations, which cannot manage word polysemy. Moreover, the common clustering algorithm K-means has the problems of initial K values and unstable initial centroid selection. A Bert-CK model based on Bert and CK-means+ is proposed. First, Bert is used to extract semantic and syntactic text features at various levels, and word vectors and sentence vectors are obtained according to the context. Then, the CK-means+ algorithm is improved based on canopy and mean calculation. Next, the K value and initial centroid are determined. The sentence vectors are input to CK-means+ to obtain user classification and topic features. Finally, semantic features and topic features are fused and classified. CK-means+ is evaluated on the Sogou user portrait dataset. The experimental results verify that Bert-CK is better than the baseline model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call