Electrocardiogram (ECG) is an efficient and simple method for the diagnosis of cardiovascular diseases and has been widely used in clinical practice. Because of the shortage of professional cardiologists and the popularity of electrocardiograms, accurate and efficient arrhythmia detection has become a hot research topic. In this paper, we propose a new multi-task deep neural network, which includes a shared low-level feature extraction module (i.e., SE-ResNet) and a task-specific classification module. Contextual Transformer (CoT) block is introduced in the classification module to dynamically model the local and global information of ECG feature sequence. The proposed method was evaluated on public CPSC2018 and PTB-XL datasets and achieved an average F1 score of 0.827 on the CPSC2018 dataset and an average F1 score of 0.833 on the PTB-XL dataset.