Cardiovascular disease (CVD) is a common disease with high mortality rate, and carotid atherosclerosis (CAS) is one of the leading causes of cardiovascular disease. Multisequence carotid MRI can not only identify carotid atherosclerotic plaque constituents with high sensitivity and specificity, but also obtain different morphological features, which can effectively help doctors improve the accuracy of diagnosis. However, it is difficult to evaluate the accurate evolution of local changes in carotid atherosclerosis in multi-sequence MRI due to the inconsistent parameters of different sequence images and the geometric space mismatch caused by the motion deviation of tissues and organs. To solve these problems, we propose a cross-scale multi-modal image registration method based on the Siamese U-Net. The network uses sub-networks with image inputs of different sizes to extract various features, and a special padding module is designed to make the network available for training on cross-scale features. In addition, to improve the registration performance, a multi-scale loss function under Gaussian smoothing is applied for optimization. For the experiments, we have collected a multi-sequence MRI image dataset from 11 patients with carotid atherosclerosis for a retrospective study. We evaluate our overall architectures by cross-validation on our carotid dataset. The experimental results show that our method can generate precise and reliable results with cross-scale multi-sequence inputs and the registration accuracy can be greatly improved by using the Gaussian smoothing loss function. The DSC of our Siamese structure can reach 84.1% on the carotid data set with cross-size input. With the use of GDSC loss, the average DSC can be improved by 5.23%, while the average distance between fixed landmarks and moving landmarks can be decreased by 6.46%.Our code is made publicly available at: https://github.com/MingHan98/Cross-scale-Siamese-Unet.