Abstract In this paper, based on the rapid head generation technique of the model library, we obtain a frontal 2D face photo, mark 13 feature points selected from 7 parts, such as both eyes, nose, mouth, etc., to determine the approximate position of the whole facial features, simulate the real color of the face according to the texture mapping of the face, and at the same time, we add the texture features of the target face in the to the matching model, to achieve the mapping of overlaying texture to the neutral texture and the fusion of skin color. Using HigherHRNet network extraction to obtain the coordinate information of the joint points of the intangible cultural heritage dance inheritors and their heat maps, the obtained key frames of the dance features are connected in a certain order to obtain the synthesized folk dance video with gesture estimation. Combining semantic segmentation of keyframes and style rendering, the visual image of the dance is designed, and the intangible cultural heritage of folk dance is analyzed through examples. The results show that this paper’s method achieves more than 90% recognition accuracy on all three datasets and more than 94.8% recognition accuracy on the folk dance movement dataset. On the evaluation of folk dance movements, the average distance calculated by this paper’s algorithm is the largest, 95.7, and the average score is the lowest, 43.6. The average distance and average score of this paper’s algorithm are in between the above two cases, i.e., the experimental results verify the effectiveness of this paper’s method. This study promotes the intelligent protection and creative inheritance of Chinese non-heritage dances through digital protection.