Abstract

Long short-term memory (LSTM) is an important model for sequential data processing. However, large amounts of matrix computations in LSTM unit seriously aggravate the training when model grows larger and deeper as well as more data become available. In this work, we propose an efficient distributed duration-aware LSTM(D-LSTM) for large scale sequential data analysis. We improve LSTM’s training performance from two aspects. First, the duration of sequence item is explored in order to design a computationally efficient cell, called duration-aware LSTM(D-LSTM) unit. With an additional mask gate, the D-LSTM cell is able to perceive the duration of sequence item and adopt an adaptive memory update accordingly. Secondly, on the basis of D-LSTM unit, a novel distributed training algorithm is proposed, where D-LSTM network is divided logically and multiple distributed neurons are introduced to perform the easier and concurrent linear calculations in parallel. Different from the physical division in model parallelism, the logical split based on hidden neurons can greatly reduce the communication overhead which is a major bottleneck in distributed training. We evaluate the effectiveness of the proposed method on two video datasets. The experimental results shown our distributed D-LSTM greatly reduces the training time and can improve the training efficiency for large scale sequence analysis.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call