Deformable image registration (DIR) between daily and reference images is fundamentally important for adaptive radiotherapy. In the last decade, deep learning-based image registration methods have been developed with faster computation time and improved robustness compared to traditional methods. However, the registration performance is often degraded in extra-cranial sites with large volume containing multiple anatomic regions, such as Computed Tomography (CT)/Magnetic Resonance (MR) images used in head and neck (HN) radiotherapy. In this study, we developed a hierarchical deformable image registration (DIR) framework, Patch-based Registration Network (Patch-RegNet), to improve the accuracy and speed of CT-MR and MR-MR registration for head-and-neck MR-Linac treatments. Patch-RegNet includes three steps: a whole volume global registration, a patch-based local registration, and a patch-based deformable registration. Following a whole-volume rigid registration, the input images were divided into overlapping patches. Then a patch-based rigid registration was applied to achieve accurate local alignment for subsequent DIR. We developed a ViT-Morph model, a combination of a convolutional neural network (CNN) and the Vision Transformer (ViT), for the patch-based DIR. A modality independent neighborhood descriptor was adopted in our model as the similarity metric to account for both inter-modality and intra-modality registration. The CT-MR and MR-MR DIR models were trained with 242 CT-MR and 213 MR-MR image pairs from 36 patients, respectively, and both tested with 24 image pairs (CT-MR and MR-MR) from 6 other patients. The registration performance was evaluated with 7 manually contoured organs (brainstem, spinal cord, mandible, left/right parotids, left/right submandibular glands) by comparing with the traditional registration methods in Monaco treatment planning system and the popular deep learning-based DIR framework, Voxelmorph. Evaluation results show that our method outperformed VoxelMorph by 6 % for CT-MR registration, and 4 % for MR-MR registration based on DSC measurements. Our hierarchical registration framework has been demonstrated achieving significantly improved DIR accuracy of both CT-MR and MR-MR registration for head-and-neck MR-guided adaptive radiotherapy.