Accurate solar radiation forecasts are necessary for solar power plants and local energy grids. Precise solar predictions can optimise plant operations and assist in managing power generation fluctuations. High-quality solar data are required to perform solar forecasting and may be accessed through open databases such as the Baseline Surface Radiation Network. Due to maintenance and malfunctioning equipment, these databases may have missing data. Deep learning methods can extract relevant features and non-linearities in data, which make them suitable for forecasting and imputation tasks. The development of the transformer architecture in 2017 marked an improvement from recurrent sequence processing using RNN (recurrent neural net) architectures to parallel sequence handling using self-attention mechanisms. Previous deep learning irradiance forecasting models utilised a RNN or RNN extension architecture. This paper presents novel applications of Transformer models to solar imputation and forecasting. Two deep learning architectures – Transformer and SAITS (Self-Attention-based Imputation for Time Series) – were trained to impute time series of global horizontal irradiance. Additionally, a Temporal Fusion Transformer (TFT) deep learning architecture was trained to forecast future values of global horizontal irradiance time series. Feature concatenation allows for the combination of differing exogenous variables into a single input vector for a deep learning solar forecasting model. Previous solar forecasting models do not incorporate cloud data or otherwise utilise an exogenous simple categorical mask to indicate cloud cover. In this paper, GOES-16 cloud optical depth and particle size datasets provide an improved representation of cloud properties. Classical imputation and forecasting models were also developed and used as baselines for comparison with deep learning models. The models were evaluated using standard statistical metrics for time series. It was found that the deep learning TFT forecasting model produced better forecasts than the classical regression model. However, the classical KNN imputer performed better than the two deep learning imputation models.