Abstract
Implementations of machine learning models in resource-limited embedded systems are becoming highly desired. This has led to a need for resource-efficient building blocks for computing the mathematical operations required for neural network training and inferencing. Efficient activation functions for low-end hardware devices with limited hardware capabilities are important. In this work, we present a method for generating symmetric and asymmetric activation functions for deep and convolution neural networks. Furthermore, we propose a solution that simultaneously computes a symmetric activation function with an integrated scaling functionality for Long Short Term Memory (LSTM) models. This effectively eliminates two of the three element-wise multipliers in an LSTM cell. Also, this built-in scaling requires no additional computation time because it is integrated within the computation of the symmetric non-linear mapping. This approach replaces the need to compute several Tanh activation functions and element-wise multipliers separately. A resource-efficient approximate multiplier is also proposed to eliminate the third element-wise multiplier and potentially replace all the resource-hungry multipliers. The digital implementation of the proposed method is highly amenable to parallelization and is extremely resource-efficient. We record an area-saving on field-programmable gate arrays with different precision. Our proposal’s formulaic equivalents are also computationally fast on CPU-based engines. On an embedded ARM processor, our method achieves a speedup of at least 4.37× for the proposed functions. We show that LSTMs with our method can achieve up to 3.5× resource footprint saving when compared to the hard activation implementation. We demonstrate that our method achieves competitive results with negligible loss of performance.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have