Deep-learning-based deformable image registration (DL-DIR) has demonstrated improved accuracy compared to time-consuming non-DL methods across various anatomical sites. However, DL-DIR is still challenging in heterogeneous tissue regions with large deformation. In fact, several state-of-the-art DL-DIR methods fail to capture the large, anatomically plausible deformation when tested on head-and-neck computed tomography (CT) images. These results allude to the possibility that such complex head-and-neck deformation may be beyond the capacity of a single network structure or a homogeneous smoothness regularization. To address the challenge of combined multi-scale musculoskeletal motion and soft tissue deformation in the head-and-neck region, we propose a MUsculo-Skeleton-Aware (MUSA) framework to anatomically guide DL-DIR by leveraging the explicit multiresolution strategy and the inhomogeneous deformation constraints between the bony structures and soft tissue. The proposed method decomposes the complex deformation into a bulk posture change and residual fine deformation. It can accommodate both inter- and intra- subject registration. Our results show that the MUSA framework can consistently improve registration accuracy and, more importantly, the plausibility of deformation for various network architectures. The code will be publicly available at https://github.com/HengjieLiu/DIR-MUSA.