Heart sound analysis is a non-invasive and economical technique that can aid in diagnosing cardiovascular disease. A novel End-to-End heart sound classification method was proposed in this paper, in which a combination of multi-scale dense network and multi-head recurrent neural network technology was used. It can be used to diagnose congenital heart disease (CHD) without using the manual extraction of features. An Fβ score of 94.33% and an accuracy of 94.41% were achieved by the method on dataset A, which consisted of 1,000 individuals and 5,000 signals. Similarly, the widely used dataset B (Physio Net/CinC 2016 dataset), comprising 764 individuals and 3,240 signals, resulted in an Fβ score of 93.75% and an accuracy of 92.97%. The results show the proposed method had a significant potential to assist in diagnosing CHD. The SHAP algorithm which is a kind of Interpretable method was applied in this study to interpret the prediction results of model. It was shown that the model’s prediction process is similar to a doctor’s diagnosing mode.