Abstract

Bayesian Convolutional Neural Networks (BCNNs) have emerged as a robust form of Convolutional Neural Networks (CNNs) with the capability of uncertainty estimation. A BCNN model is implemented by adding a dropout layer after each convolutional layer in the original CNN. By executing the stochastic inferences many times, BCNNs are able to provide an output distribution that reflects the uncertainty of the final prediction. Repeated inferences in this process lead to much longer execution time, which makes it challenging to apply Bayesian technique to CNNs in real-world applications. In this study, we propose Fast-BCNN, an FPGA-based hardware accelerator design that intelligently skips the redundant computations for two types of neurons during repeated BCNN inferences. Firstly, within a sample inference, we aim to skip the dropped neurons that predetermined by dropout masks. Secondly, by leveraging the information from the first inference and dropout masks, we predict the zero neurons and skip all their corresponding computations during the following sample inferences. Particularly, an optimization algorithm is employed to guarantee the accuracy of zero neuron prediction while achieving the maximal computation reduction. To support our neuron skipping strategy at hardware level, we explore an efficient parallelism for CNN convolution to gracefully skip the corresponding computations for both types of neurons, we then propose a novel PE architecture that accommodates the parallel operation of convolution and prediction with negligible overhead. Experimental results demonstrate that our Fast-BCNN achieves 2.1~8.2× speedup and 44%~84% energy reduction over the baseline CNN accelerator.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call