In this Communication, we demonstrate that a deep artificial neural network based on a transformer architecture with self-attention layers can predict the long-time population dynamics of a quantum system coupled to a dissipative environment provided that the short-time population dynamics of the system is known. The transformer neural network model developed in this work predicts the long-time dynamics of spin-boson model efficiently and very accurately across different regimes, from weak system-bath coupling to strong coupling non-Markovian regimes. Our model is more accurate than classical forecasting models, such as recurrent neural networks, and is comparable to the state-of-the-art models for simulating the dynamics of quantum dissipative systems based on kernel ridge regression.