To accelerate the back-end design flow of integrated circuit (IC), numerous studies have made exploratory advancements in machine learning (ML) for electronic design automation (EDA). However, most research works are limited to deep learning (DL) models predominantly based on convolutional neural networks, and the models often suffer from poor generalization due to the scarcity of data. In this study, we propose the Double generative adversarial networks (D-GAN) model to enrich the dataset and propose the Regression Vision Transformer (R-ViT) model to predict layout congestion information. Compared to the baseline model, experimental results show improvements of 3.03% and 2.64% in Receiver Operating Characteristic-Area under Curve (ROC-AUC) and Precision-Recall Curve-Area under Curve (PRC-AUC) respectively. To further enhance the prediction accuracy of the model, an adaptive Huber loss function is designed to optimize the training process, resulting in an improvement of up to 11.03% in ROC-AUC compared with the baseline model. Lastly, extended experiments are conducted to study the effects of parameters and convolutional kernel size on performance, which find a better configuration.