Aiming at problems such as the difficulty of recognizing emotions in the elderly and the inability of traditional machine-learning models to effectively capture the nonlinear relationship between physiological signal data, a Recursive Map (RM) combined with a Vision Transformer (ViT) is proposed to recognize the emotions of the elderly based on Electroencephalogram (EEG), Electrodermal Activity (EDA), and Heart Rate Variability (HRV) signals. The Dung Beetle Optimizer (DBO) is used to optimize the variational modal decomposition of EEG, EDA, and HRV signals. The optimized decomposed time series signals are converted into two-dimensional images using RM, and then the converted image signals are applied to the ViT for the study of emotion recognition of the elderly. The pre-trained weights of ViT on the ImageNet-22k dataset are loaded into the model and retrained with the two-dimensional image data. The model is validated and compared using the test set. The research results show that the recognition accuracy of the proposed method on EEG, EDA, and HRV signals is 99.35%, 86.96%, and 97.20%, respectively. This indicates that EEG signals can better reflect the emotional problems of the elderly, followed by HRV signals, while EDA signals have poorer effects. Compared with Support Vector Machine (SVM), Naive Bayes (NB), and K-Nearest Neighbors (KNN), the recognition accuracy of the proposed method is increased by at least 9.4%, 11.13%, and 12.61%, respectively. Compared with ResNet34, EfficientNet-B0, and VGG16, it is increased by at least 1.14%, 0.54%, and 3.34%, respectively. This proves the superiority of the proposed method in emotion recognition for the elderly.