Time series forecasting is the problem of predicting future data samples from historical information and recent deep neural networks (DNNs) based techniques have achieved excellent results compared with conventional statistical approaches. Many applications at the edge can utilise this technology and most implementations have focused on inference, an ability to train at the edge would enable the deep neural network (DNN) to adapt to changing conditions. Unfortunately, training requires approximately three times more memory and computation than inference. Moreover, edge applications are often constrained by energy efficiency. In this work, we implement a block minifloat (BM) training accelerator for a time series prediction network, N-BEATS. Our architecture involves a mixed precision GEMM accelerator that utilizes block minifloat (BM) arithmetic. We use a 4-bit DSP packing scheme to optimize the implementation further, achieving a throughput of 779 Gops. The resulting power efficiency is 42.4 Gops/W, 3.1x better than a graphics processing unit in a similar technology.
Read full abstract