The advancement of 6G (6th Generation Mobile Networks) communication technology has posed challenges for traditional communication network architectures in meeting the demands for communication efficiency and quality. Semantic communication technology, characterized by its “understand before transmit” approach, has emerged as a pivotal technology driving the progress of 6G due to its ability to enhance communication efficiency and quality. The Wireless Image Transmission Transformer (WITT) model, which operates as a semantic communication system leveraging vision transformer technology for the transmission of semantic images, has shown efficacy in transmitting input images through processes of feature extraction and channel adaptation. This study introduces an advanced channel adaptive module that is informed by deep learning methodologies and the adaptive modulation principles of the Variational Information Bottleneck (VIB). This innovation enhances the original WITT model, resulting in the development of the Adaptive Wireless Image Transmission Transformer (ADWITT) architecture. Comprehensive experimental results have unequivocally shown that the transmission performance of the ADWITT architecture substantially surpasses that of the conventional WITT (Wavelet Image Transmission Technique) model, particularly in scenarios characterized by harsh and detrimental channel conditions. These findings underscore the robustness and adaptability of the ADWITT approach, which is poised to improve the field of image transmission by offering superior performance and resilience in environments where traditional methods falter.