The challenge of accurately forecasting ultra-short-term solar irradiance for photovoltaic systems is complicated by rapidly changing weather, and while ground-based sky images offer potential improvements, effectively extracting spatiotemporal data from these images remains a significant hurdle for current computer vision models. A new hybrid model, "An Attention Fused Sequence-to-Sequence Convolutional Neural Network," is being developed to address this challenge. The model predicts intra-hour GHI, DNI, and DHI with a 10-min lead time by combining a Convolutional Neural Network (for spatial feature extraction from sky images), an attention mechanism (to focus on relevant regions), and a sequence-to-sequence model (for temporal feature extraction from time-series data). The proposed Model is trained using the NREL Solar Radiation Research Laboratory Dataset while evaluating the model with the Mean Bias Error, Mean Absolute Error, Root Mean Squared Error, R Squared, and forecasting skill score. The 10-min and sequence length 2 interval is considered to be the best performing across most of the evaluation metrics with an MBE value is 2.321 W/m2, MAE value of 39.490 W/m2, RMSE value of 62.086 W/m2, R2 value is 0.909 W/m2, FSS value of 23.589 W/m2 and an MBE of 4.876 W/m2, MAE of 56.887 W/m2, RMSE of 85.346 W/m2, R2 of 0.834 and FSS of 24.881 W/m2 respectively. Furthermore, the sensitivity analysis reveals that the proposed model's performance is influenced by both the sequence length and the lead time. The proposed framework outperforms other techniques for ultra-short-term PV generation forecasting, demonstrating its potential for practical deployment in PV systems to improve grid reliability and energy management.