To solve the problems of low accuracy and false counts of existing models in road damage object detection and tracking, in this paper, we propose Road-TransTrack, a tracking model based on transformer optimization. First, using the classification network based on YOLOv5, the collected road damage images are classified into two categories, potholes and cracks, and made into a road damage dataset. Then, the proposed tracking model is improved with a transformer and a self-attention mechanism. Finally, the trained model is used to detect actual road videos to verify its effectiveness. The proposed tracking network shows a good detection performance with an accuracy of 91.60% and 98.59% for road cracks and potholes, respectively, and an F1 score of 0.9417 and 0.9847. The experimental results show that Road-TransTrack outperforms current conventional convolutional neural networks in terms of the detection accuracy and counting accuracy in road damage object detection and tracking tasks.