Abstract

Implementations of machine learning models in resource-limited embedded systems are becoming highly desired. This has led to a need for resource-efficient building blocks for computing the mathematical operations required for neural network training and inferencing. Efficient activation functions for low-end hardware devices with limited hardware capabilities are important. In this work, we present a method for generating symmetric and asymmetric activation functions for deep and convolution neural networks. Furthermore, we propose a solution that simultaneously computes a symmetric activation function with an integrated scaling functionality for Long Short Term Memory (LSTM) models. This effectively eliminates two of the three element-wise multipliers in an LSTM cell. Also, this built-in scaling requires no additional computation time because it is integrated within the computation of the symmetric non-linear mapping. This approach replaces the need to compute several Tanh activation functions and element-wise multipliers separately. A resource-efficient approximate multiplier is also proposed to eliminate the third element-wise multiplier and potentially replace all the resource-hungry multipliers. The digital implementation of the proposed method is highly amenable to parallelization and is extremely resource-efficient. We record an area-saving on field-programmable gate arrays with different precision. Our proposal’s formulaic equivalents are also computationally fast on CPU-based engines. On an embedded ARM processor, our method achieves a speedup of at least 4.37× for the proposed functions. We show that LSTMs with our method can achieve up to 3.5× resource footprint saving when compared to the hard activation implementation. We demonstrate that our method achieves competitive results with negligible loss of performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call