With the popularity of deep learning, motor imagery electroencephalogram (MI-EEG) recognition based on feature extractors and classifiers has performed well. However, the features extracted by most models are not discriminative enough and are limited to specific-subject classifi-cation. We proposed a novel model Metric-based Spatial Filtering Transformer (MSFT) that utilizes additive angular margin loss to enforce the deep model to improve inter-class separability while enhancing intra-class compactness. Besides, a data augmentation method called EEG pyramid was applied to the model. Our model not only outperforms many recent benchmarks in specific-subject classifi-cation, but also is used for cross-subject and even cross-task classification. We did some experiments using BCI competition IV 2a and 2b datasets to evaluate the average accuracy. The Specific-subject: 86.11 % for 2a, 88.39 % for 2b. The Cross-subject: 61.92 % for 2a. The Cross-task: training the feature extractor with 2a data and then fine-tuning the classifier with 2b can achieve an average accuracy of 83.38 %. Our method is more general than most benchmarks and can deal with different kinds of classification situations.