Abstract. City portrait is a social impression generated by the interaction between the public and the city, which can help us better understand and perceive the nature and characteristics of the city, and thus provide strong support for the development and governance of the city. However, most existing studies extract thematic semantic labels globally, but ignore the order of the tags and the degree of their contribution in the topic, which affects the city portrait extraction results. In addition, existing studies also lack the analysis of the impact of grid areas as the study scale on city portraits. In this paper, we propose a new approach to accurately identify city labels based on multi-source data grid fusion using a topic feature word extraction model (Weight-LdaVecNet) with fused topic word embedding and network structure analysis with feature word weight constraints. On this basis, we construct a multi-level city portrait description framework using hierarchical cluster analysis, extract tag clusters, and obtain a similarity matrix by combining topic feature tags and region feature tags using similarity analysis to construct a multi-level city region portrait, with a view to achieving a fine-grained construction of a multi-level city portrait. The experimental results show that, compared with the traditional LDA model, our method indicates that the identified city labels with similar thematic semantics have strong aggregation, thus proving the effectiveness of our proposed method. In addition, in the overall multi-level city portrait, we find that Beijing has a strong attractiveness in terms of cultural features. However, the regional distribution of cultural characteristics dimensions is not uniform in the multilevel city-region portrait, and better rational allocation and planning of cultural resources are needed to better meet people's needs.