Transformers have shown remarkable success in modeling sequential data and capturing intricate patterns over long distances. Their self-attention mechanism allows for efficient parallel processing and scalability, making them well-suited for the high-dimensional data in hyperspectral and LiDAR imagery. However, further research is needed on how to more deeply integrate the features of two modalities in attention mechanisms. In this paper, we propose a novel Multi-Feature Cross Attention-Induced Transformer Network (MCAITN) designed to enhance the classification accuracy of hyperspectral and LiDAR data. The MCAITN integrates the strengths of both data modalities by leveraging a cross-attention mechanism that effectively captures the complementary information between hyperspectral and LiDAR features. By utilizing a transformer-based architecture, the network is capable of learning complex spatial-spectral relationships and long-range dependencies. The cross-attention module facilitates the fusion of multi-source data, improving the network’s ability to discriminate between different land cover types. Extensive experiments conducted on benchmark datasets demonstrate that the MCAITN outperforms state-of-the-art methods in terms of classification accuracy and robustness.