The preservation and transmission of traditional villages is crucial to the prosperity and development of ethnic cultures. However, current traditional village surveys usually require a large number of experts and scholars to conduct field research, which is an expensive and time-consuming method, especially for large-scale tasks. Therefore, this study proposes an automatic classification method based on deep learning (DL) for the identification of traditional village heritage value elements (TVHVE). The study evaluates four selected convolutional neural network (CNN) frames using traditional villages in Hubei Province as a sample dataset. The results show that Residual Network152 (ResNet152) is the most suitable CNN frame for identifying TVHVE in Hubei. The stability and consistency of various TVHVE present in the ResNet152 model were evaluated using Area Under Curve (AUC) and Precision Recall Curve (PRC), which indicated satisfactory prediction performance for most elements, except for specific elements such as tombstones and stone carvings, which showed lower accuracy. In addition, the study sheds light on the areas of concern of the model with respect to different TVHVE images and elucidates the reasons behind the confusion between elements through semantic clustering based on image classification and interpretability analysis using the Gradient-Weighted Class Activation Mapping (Grad-CAM) heat map. By using an automated classification method based on DL, this study significantly reduces the cost and effort associated with traditional surveys. At the same time, insight into areas of concern and confusion in the model improves guidance for conservation efforts and provides valuable references for subsequent research.