Abstract Each person will have a different emotional experience for the same scene or exhibit, which reduces the accuracy of emotional recognition and leads to the complexity of the evaluation of visitors' emotional experience in digital museums. In order to improve the user experience of digital museums, the evaluation and optimization methods of visitors' emotional experience of digital museums based on virtual reality (VR) technology and emotion recognition algorithm are studied. A spectrogram is generated according to the voice sent by tourists when they visit the digital museum, which is based on the CSWNet_CRNN emotion recognition depth learning model input, which evaluates tourists' emotional experience results and draws their emotional responses to digital museums. The visual and auditory features of the digital museum scene with positive emotional experience are extracted, respectively. Using VR technology, the extracted features are applied to each link of the digital museum scene content design, optimizing the digital museum VR scene and improving the digital museum experience. The experiment shows that the tourist emotion recognition accuracy of this method is high, and the emotion recognition accuracy of 300 random tourists can reach 100%. In terms of generating new scenes, the feature extraction results of this scene are consistent with the feature estimation of positive emotions by ordinary people. The use of extracted features to optimize the digital museum scene has better realism and detail accuracy, which can be favored by most people and promotes the sustainable development of digital museums.