Decoding semantic concepts for imagination and perception tasks (SCIP) is important for rehabilitation medicine as well as cognitive neuroscience. Electroencephalogram (EEG) is commonly used in the relevant fields, because it is a low-cost noninvasive technique with high temporal resolution. However, as EEG signals contain a high noise level resulting in a low signal-to-noise ratio, it makes decoding EEG-based semantic concepts for imagination and perception tasks (SCIP-EEG) challenging. Currently, neural network algorithms such as CNN, RNN, and LSTM have almost reached their limits in EEG signal decoding due to their own short-comings. The emergence of transformer methods has improved the classification performance of neural networks for EEG signals. However, the transformer model has a large parameter set and high complexity, which is not conducive to the application of BCI. EEG signals have high spatial correlation. The relationship between signals from different electrodes is more complex. Capsule neural networks can effectively model the spatial relationship between electrodes through vector representation and a dynamic routing mechanism. Therefore, it achieves more accurate feature extraction and classification. This paper proposes a spatio-temporal capsule network with a self-correlation routing mechaninsm for the classification of semantic conceptual EEG signals. By improving the feature extraction and routing mechanism, the model is able to more effectively capture the highly variable spatio-temporal features from EEG signals and establish connections between capsules, thereby enhancing classification accuracy and model efficiency. The performance of the proposed model was validated using the publicly accessible semantic concept dataset for imagined and perceived tasks from Bath University. Our model achieved average accuracies of 94.9%, 93.3%, and 78.4% in the three sensory modalities (pictorial, orthographic, and audio), respectively. The overall average accuracy across the three sensory modalities is 88.9%. Compared to existing advanced algorithms, the proposed model achieved state-of-the-art performance, significantly improving classification accuracy. Additionally, the proposed model is more stable and efficient, making it a better decoding solution for SCIP-EEG decoding.