Hyperspectral image (HSI) and light detection and ranging (LiDAR) data have gained significant attention due to their excellent complementarity, which can be combined to improve the accuracy of land cover classification. However, there are still many challenges, such as unrelated physical characteristics, different data structures, and a lack of labeled samples. Many methods fail to leverage the full potential of multi-source data, particularly hierarchical complementary information. To address these problems, a hierarchical coarse–fine adaptive (HCFA) fusion network with dynamic convolution and a transformer is proposed for multi-source remote sensing land cover classification. Although fusing hierarchical information can improve the classification accuracy of the model, improper hierarchical feature selection and optimization may negatively affect the classification results. Therefore, a coarse–fine mutual learning strategy is proposed to dynamically fuse hierarchical information. Additionally, the disparity between multi-source data continues to prevent the realization of effective fusion. To tackle this challenge, cross-tokenization and cross-token attention are implemented to enhance information interaction. Furthermore, to improve the model representation with limited computational cost, we combine the advantages of dynamic convolution with a transformer. Validation on three standard datasets demonstrates that HCFA achieves high accuracy with just 1% of the training set while maintaining low computational costs.