Network traffic prediction is crucial for cost-effective network management, resource allocation, and security in the emergent software-defined and zero-touch networks. Machine Learning (ML) models such as Long-Short Term Memory (LSTM) and Gated Recurrent Unit (GRU) have shown great promise in traffic prediction. This work proposes a hybrid ML model consisting of pre-trained AlexNet and ResNet models, the 2D Convolutional Neural Network (CNN), Vision Transformer (ViT), and LSTM/GRU models and their variants to predict network traffic. This work explores two approaches for converting the time series data to images by allowing more precise feature extraction and then performing traffic prediction on an image dataset, thus increasing accuracy. Given their proficiency in extracting image features we use three pre-trained models: the AlexNet, the ResNet combined with CNN, and the ViT. Then, we use those models as inputs to the LSTM, Bi-LSTM, and the GRU, Bi-GRU models for predicting network traffic data. The models are evaluated on the GÉANT and Abilene traffic networks and assessed based on the Mean Square Error (MSE), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE). Comparison of results with existing network traffic prediction techniques shows considerable improvements in all performance metrics revealed by the Wilcoxon test. Our best model results for the GÉANT and Abilene datasets are respectively; MSE of 0.00058 and 0.03078, RMSE of 0.02415 and 0.17544, and MAE of 0.00774 and 0.13384.