Well logging fluid prediction is one of the key steps in assessing oil and gas reserves. By analyzing downhole logging data, different types of fluids contained in underground rocks, such as crude oil, natural gas, and water, can be determined. This information is crucial for assessing the abundance and recoverable reserves of oil and gas resources and helps guide oil and gas exploration and development work. We have introduced a novel model called CWT (Continuous Wavelet Transform)-ViT (Vision Transformer). CWT can simultaneously provide frequency information at different scales, enabling the model to analyze downhole logging data more comprehensively and accurately at different scales. Underground rock structures often exhibit features at multiple scales, and CWT can effectively capture these features, aiding in better differentiation of different types of fluids. The ViT model utilizes the Transformer architecture, allowing for global attention over input sequences without being limited by sequence length. This enables the model to comprehensively understand the overall information of downhole logging data and extract richer features. For complex geological structures and fluid distributions in geological exploration, the global attention mechanism helps the model better grasp the overall situation, thereby improving the accuracy of fluid prediction. When we used the CWT-ViT method for well logging fluid prediction, we achieved a high accuracy rate of 97.50% in the first dataset, which further improved to 97.77% in the second dataset. These results demonstrate the significant robustness and efficiency of the CWT-ViT method in lithology prediction using well logging data. We also conducted blind well experiments, and our CWT-ViT model outperformed other models, achieving a blind well prediction accuracy of 97.36%. Therefore, the experiments indicate that the key to improving accuracy in well logging fluid prediction with CWT lies in its multiscale analysis capability, effectively capturing different fluid characteristic frequencies. Additionally, CWT enhances signal features and removes noise, increasing the precision of fluid identification. Finally, the integration with ViT further optimizes fluid prediction performance, making it outstanding in complex geological environments. The advantages of ViT in fluid prediction include its excellent sequence modeling capability, effective handling of long-distance dependencies, and enhanced ability to capture fluid characteristics in complex well logging data.