Recurrent Neural Networks (RNNs), particularly when equipped with output windows – a standard practice in contemporary time series forecasting – have shown proficiency in handling short-term dependencies. Nonetheless, RNNs can encounter challenges in maintaining hidden states over extended forecasting periods, particularly in longer-term predictions where increased hidden state sizes and extended look-back windows can lead to gradient instability. In contrast, Transformer-based models, with their distinctive architecture designed to encode complex contextual relationships and enable computations to be done in parallel, are emerging as a popular alternative in this field. However, current research has mainly focused on modifying attention mechanisms, overlooking opportunities to improve the feedforward layer, which could lead to efficiency limitations. Moreover, prevailing methods often assume absolute independence between channels, disregarding distinct features among variables and failing to fully leverage channel-specific information. To address these gaps, we propose an efficient transformer design for multivariate time series prediction. Our approach integrates two key components: (i) a gated residual attention unit that enhances predictive accuracy and computational efficiency, and (ii) a channel embedding technique that differentiates between series and boosts performance. Theoretically, we prove that our model has recurrent dynamics introduced by the RNN layer. Through extensive experiments on real-world data, we demonstrate that our proposed method achieves competitive predictive accuracy compared to prior approaches while exhibiting accelerated processing relative to state-of-the-art transformers. Our code, data, and trained models are available at https://github.com/MythosAd/GRAformer.