In the context of smart healthcare, the integration of multimedia and digital twin technologies has driven significant advances in telemedicine. Electrocardiogram (ECG) signals, as a part of multimedia healthcare data, provide crucial digital information about the electrical activity of the heart, which is essential for diagnosing arrhythmias and ensuring a healthy life. Arrhythmia classification is a fundamental step in analyzing ECG signals and a critical problem for diagnosing heart diseases. One key challenge in arrhythmia classification is the lack of high accuracy for classifying arrhythmia heartbeats, and another key challenge is the lack of interpretability of decision-making models. This study aims to develop a novel approach to improve the performance of arrhythmia classification while providing explainable diagnostic decision paths. We propose XDTEncoder, an explainable arrhythmia classification framework that leverages multi-level features to classify arrhythmia heartbeats while offering an explainable diagnostic decision path. XDTEncoder is novel in three aspects. (1) It constructs a human-machine collaborative knowledge representation based on the Encoder-Decoder paradigm, which allows our model to classify arrhythmias while producing decision paths for cardiologists. (2) XDTEncoder compares two encoding methods (the binary tree encoding method and the Huffman encoding method) to embed the diagnostic decision tree into the arrhythmia classification framework. (3) XDTEncoder fuses multi-level features to improve the performance of arrhythmias classification. Evaluation on 5 types of arrhythmias in the MIT-BIH database demonstrates that our new approach outperforms state-of-the-art classifiers while providing interpretability.