When Massive GPU Parallelism Ain't Enough

Vladimir Rybalkin,Norbert Wehn

doi:10.1145/3373087.3375301

Abstract

Multidimensional Long Short-Term Memory (MD-LSTM) neural network is an extension of one-dimensional LSTM for data with more than one dimension that allows MD-LSTM to show state-of-the-art results in various applications including handwritten text recognition, medical imaging, and many more. However, efficient implementation suffers from very sequential execution that tremendously slows down both training and inference compared to other neural networks. This is the primary reason that prevents intensive research involving MD-LSTM in the recent years, despite large progress in microelectronics and architectures. The main goal of the current research is to provide acceleration for inference of MD-LSTM, so to open a door for efficient training that can boost application of MD-LSTM. By this research we advocate that FPGA is an alternative platform for deep learning that can offer a solution in cases when a massive parallelism of GPUs does not provide the necessary performance required by the application. In this paper, we present the first hardware architecture for MD-LSTM. We conduct a systematic exploration of precision vs. accuracy trade-off using challenging dataset for historical document image binarization from DIBCO 2017 contest, and well known MNIST dataset for handwritten digits recognition. Based on our new architecture we implement FPGA-based accelerator that outperforms NVIDIA K80 GPU implementation in terms of runtime by up to 50x and energy efficiency by up to 746x. At the same time, our accelerator demonstrates higher accuracy and comparable throughput in comparison with state-of-the-art FPGA-based implementations of multilayer perceptron for MNIST dataset.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

When Massive GPU Parallelism Ain't Enough

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

When Massive GPU Parallelism Ain’t Enough: A Novel Hardware Architecture of 2D-LSTM Neural Network
Vladimir Rybalkin ... Jonas Ney
ACM Transactions on Reconfigurable Technology and Systems | VOL. 15
Vladimir Rybalkin, et. al.Vladimir Rybalkin ... Jonas Ney
09 Nov 2021
ACM Transactions on Reconfigurable Technology and Systems | VOL. 15

Urdu Nasta'liq text recognition using implicit segmentation based on multi-dimensional long short term memory neural networks.
Saeeda Naz ... Faisal Shafait
SpringerPlus | VOL. 5
Saeeda Naz, et. al.Saeeda Naz ... Faisal Shafait
25 Nov 2016
SpringerPlus | VOL. 5

The algorithm of current prediction based on multi-dimensional Long Short Term Memory networks
Jingjing Peng ... Wei Yu
Energy Reports | VOL. 7
Jingjing Peng, et. al.Jingjing Peng ... Wei Yu
01 Nov 2021
Energy Reports | VOL. 7

Automated Grading of Essays: A Review
Jyoti G Borade ... Laxman D Netak
-
Jyoti G Borade, et. al.Jyoti G Borade ... Laxman D Netak
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

When Massive GPU Parallelism Ain't Enough

Abstract

Talk to us

Similar Papers