Emotion recognition from EEG signals has attracted much attention in affective computing. Recently, a novel dynamic graph convolutional neural network (DGCNN) model was proposed, which simultaneously optimized the network parameters and a weighted graph <inline-formula><tex-math notation="LaTeX">$G$</tex-math></inline-formula> characterizing the strength of functional relation between each pair of two electrodes in the EEG recording equipment. In this article, we propose a sparse DGCNN model which modifies DGCNN by imposing a sparseness constraint on <inline-formula><tex-math notation="LaTeX">$G$</tex-math></inline-formula> and improves the emotion recognition performance. Our work is based on an important observation: the tomography study reveals that different brain regions sampled by EEG electrodes may be related to different functions of the brain and then the functional relations among electrodes are possibly highly localized and sparse. However, introducing sparseness constraint into the graph <inline-formula><tex-math notation="LaTeX">$G$</tex-math></inline-formula> makes the loss function of sparse DGCNN non-differentiable at some singular points. To ensure that the training process of sparse DGCNN converges, we apply the forward-backward splitting method. To evaluate the performance of sparse DGCNN, we compare it with four representative recognition methods (SVM, DBN, GELM and DGCNN). In addition to comparing different recognition methods, our experiments also compare different features and spectral bands, including EEG features in time-frequency domain (DE, PSD, DASM, RASM, ASM and DCAU on different bands) extracted from four representative EEG datasets (SEED, DEAP, DREAMER, and CMEED). The results show that (1) sparse DGCNN has consistently better accuracy than representative methods and has a good scalability, and (2) DE, PSD, and ASM features on <inline-formula><tex-math notation="LaTeX">$\gamma$</tex-math></inline-formula> band convey most discriminative emotional information, and fusion of separate features and frequency bands can improve recognition performance.