Extracting invariant representations in unlabeled electrocardiogram (ECG) signals is a challenge for deep neural networks (DNNs). Contrastive learning is a promising method for unsupervised learning. However, it should improve its robustness to noise and learn the spatiotemporal and semantic representations of categories, just like cardiologists. This article proposes a patient-level adversarial spatiotemporal contrastive learning (ASTCL) framework, which includes ECG augmentations, an adversarial module, and a spatiotemporal contrastive module. Based on the ECG noise attributes, two distinct but effective ECG augmentations, ECG noise enhancement, and ECG noise denoising, are introduced. These methods are beneficial for ASTCL to enhance the robustness of the DNN to noise. This article proposes a self-supervised task to increase the antiperturbation ability. This task is represented as a game between the discriminator and encoder in the adversarial module, which pulls the extracted representations into the shared distribution between the positive pairs to discard the perturbation representations and learn the invariant representations. The spatiotemporal contrastive module combines spatiotemporal prediction and patient discrimination to learn the spatiotemporal and semantic representations of categories. To learn category representations effectively, this article only uses patient-level positive pairs and alternately uses the predictor and the stop-gradient to avoid model collapse. To verify the effectiveness of the proposed method, various groups of experiments are conducted on four ECG benchmark datasets and one clinical dataset compared with the state-of-the-art methods. Experimental results showed that the proposed method outperforms the state-of-the-art methods.