Abstract

Time Series Forecasting (TSF) is essential to key domains, and the Transformer neural network has advanced the state-of-the-art on global, multi-horizon TSF benchmarks. The quadratic time and memory complexity of the Vanilla Transformer (VT) hinders its application to Big Data environments; therefore, multiple efficient variants of the VT that lower complexity via sparse self-attention have been proposed. However, less complex algorithms do not directly produce faster executions, and machine learning models for Big Data are typically trained on accelerators designed for dense-matrix computation that render slower performance with sparse matrices. To better compare the accuracy-speed trade-off of the VT and its variants, it is essential to test them on such accelerators. We implemented a cloud-based VT on Tensor Processing Units to address this task. Experiments on large-scale datasets show that our Transformer achieves good predictive performance when compared to state-of-the-art models while reducing training times from hours to under 2 min.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call