Diffusion tensor imaging (DTI) is a promising technique for non-invasively investigating the myocardial fiber structures of human heart. However, low signal-to-noise ratio (SNR) has been a major limit of cardiac DTI to prevent us from detecting myocardium structure accurately. Therefore, it is important to remove the effect of noise on diffusion weighted (DW) images. Although the conventional and deep learning-based denoising methods have shown the potential to deal with effectively the noise in DW images, most of them are redundant information dependent or require the noise-free images as golden standard. In addition, the existed DW image denoising methods often suffer from problems of over-smoothing. To address these issues, we propose a self-supervised learning model, structural similarity based convolutional neural network with edge-weighted loss (SSECNN), to remove the noise effectively in cardiac DTI. Considering that the DW images acquired along different diffusion directions have structural similarity, and the noise in these DW images is independent and identically distributed, the structural similarity-based matching algorithm is proposed to search for the most similar DW images. Such similar noisy DW image pairs are then used as the input and target of the denoising network SSECNN, which consists of several convolutional and residual blocks. Through the self-supervised training with these image pairs, the network can restore the clean DW images and retain the correlations between the denoised DW images along different directions. To avoid the over-smoothing problem, we design a novel edge-weighted loss which enables the network to adaptively adjust the loss weights with iterations and therefore to improve the detail preserve ability of the model. To verify the superiority of the proposed method, comparisons with state-of-the-art (SOTA) denoising methods are performed on both synthetic and real acquired DTI datasets. Experimental results show that SSECNN can effectively reduce the noise in the DW images while preserving detailed texture and edge information and therefore achieve better performance in DTI reconstruction. For synthetic dataset, compared to the SOTA method, the root mean square error (RMSE), peak signal-to-noise ratio (PSNR), and structure similarity index measure (SSIM) between the denoised DW images obtained with SSECNN and noise-free DW images are improved by 6.94%, 1.98%, and 0.76% respectively when the noise level is 10%. As for the acquired cardiac DTI dataset, the SSECNN method could significantly improve SNR and contrast to noise ratio (CNR) of cardiac DW images and achieve more regular helix angle (HA) and transverse angle (TA) maps. The ablation experimental results validate that using the structure similarity-based method to search the similar DW image pairs yield the smallest loss, and with the help of the edge-weighted loss, the denoised DW images and diffusion metric maps can preserve more details. The proposed SSECNN method can fully explore the similarity between the DW images along different diffusion directions. Using such similarity and an edge-weighted loss enable us to denoise cardiac DTI effectively in a self-supervised manner. Our method can overcome the redundancy information dependence and over-smoothing problem of the SOTA methods.