Abstract Amid the escalating demand for heightened realism across diverse sectors such as film, television, gaming, tourism, virtual display, and the preservation of cultural heritage, there has been a notable advancement in reverse modeling technology that facilitates the extraction of three-dimensional models from image sequences. This study focuses on the classification of 3D reconstruction for the animation restoration of historical scenes. It delves into the technical pathways for the 3D reconstruction of such scenes, employing a clustering-based Scale Invariant Feature Transform (SIFT) feature-matching acceleration algorithm for the extraction and matching of image features. Subsequently, it integrates the Bundler method of camera self-calibration with the Patch-based Multi-view Stereo (PMVS) algorithm for dense reconstruction, culminating in the assembly and testing of the reconstructed historical scene. The acceleration time of the SIFT algorithm is higher than that of the standard algorithm when the number of images is 10. The standard algorithm’s acceleration time is the longest when there are more than 10 images. When the number of images is greater than 50, the acceleration ratio of sigma=60 is greater than that of sigma=120. This indicates that the clustering-based SIFT algorithm is suitable for accelerated matching of large-scale image sets. The acceleration effect can be improved by appropriately decreasing the value of sigma. The maximum difference between the five independent vectors in model 1 and model 2 is 3.18. After scaling model 2, the difference is narrowed to[ 0.01,0.03]. The small error of the model in scenes 3~9 and the clear graphic texture indicate that the 3D reconstruction model of the historical scene designed in this paper has high accuracy and provides a model reference for the animation restoration of the historical scene.