The efficient use of resources is one of the most important things in the current development. In the field of lightweight constructions, rubber and reinforced polymer composite materials become more and more important. This constantly increases the demands on the simulation software. The development of robust and efficient algorithms for these simulations is the aim of this paper.The application of higher order finite elements in space has a good influence on solution quality in space, but causes a higher computational cost compared to linear elements. We propose higher order finite elements in space and time in combination with a reinforced viscoelastic material formulation in a variational framework. Therefore, a higher order approximation in space and time is applied. The application of variational based time integrators guarantees the preservation of the total balance of linear momentum and the total balance of angular momentum. In order to fulfill the total balance of energy, an extension with discrete gradients is developed for the variational framework. The achieved time stepping scheme represents a very robust and consistent algorithm for the application of transient finite element simulations with reinforced viscoelastic materials and boundary conditions.In an implementation, however, the higher order approximation in space and time combined with the viscoelastic material suffers from a high computational effort. Hence, an efficient implementation is required in order to reduce the computational time to a minimum. In our approach, we face this problem by using a GPU and the programming architecture Cuda from NVIDIA, which allows a massive parallelization of time-consuming parts of the simulation. We introduce a pipeline design for the GPU implementation, which provides multiple advantages. This design allows a simple porting of an already existing implementation by means of self-managing pipeline-stages. However, a significant speedup is still achieved due to further optimizations which exploit the architecture of GPUs. In addition, when combining both hardware resources GPU and CPU the computational time can be reduced significantly once more. Therefore, our GPU implementation easily allows a distribution of computational effort between both GPU and CPU. Finally, we show in numerical examples the reached speedup of this approach, and the impact of combining the GPU and the CPU is studied in detail.