Inclement weather conditions, particularly heavy rain and the consequent wet road surface, have an unfavorable effect on driving conditions, traffic infrastructure, and operational plans. To mitigate the potentially detrimental ramifications of turbulent weather, it must be continuously monitored in real time and with high geospatial granularity. Traditionally, road weather conditions are monitored using weather forecasts or Roadside Weather Information Systems (RWIS). However, these methods are either ill-equipped or too expensive to provide the required fine-grained observations. Alternatively, roadside traffic CCTV cameras are ubiquitously deployed on US freeways and can serve as inexpensive sensors to surveil weather. In this paper, a novel vision-based methodology is proposed to detect rain and road surface conditions from roadside traffic cameras. Vision Transformers were utilized for image-based classification. They demonstrated superior results compared to convolution-based approaches. Furthermore, the geographical distribution of roadside cameras was leveraged to add spatial context awareness to the detection model. A Spatial Self-Attention network was proposed to model the relationship between the detection results of adjacent images as a sequence-to-sequence detection task. The results indicate that the addition of the sequential detection module improved the accuracy of the stand-alone Vision Transformer as measured by the F1-score. The boost in performance enhanced the F1-scores of the stand-alone Vision Transformer by 5.61% and 5.97% for the rain and road surface condition detection tasks, respectively, raising the total F1-score to 96.71% and 98.07%.