Temporal action recognition always depends on temporal action proposal generation to hypothesize actions and algorithms usually need to process very long video sequences and output the starting and ending times of each potential action in each video suffering from high computation cost. To address this, based on boundary sensitive network we propose a new temporal convolution network called Multipath Temporal ConvNet (MTN), which consists of two parts i.e. Multipath DenseNet and SE-ConvNet. In this work, one novel high performance ring parallel architecture based on Message Passing Interface (MPI) is further introduced into temporal action proposal generation, which is a reliable communication protocol, in order to respond to the requirements of large memory occupation and a large number of videos. Remarkably, the total data transmission is reduced by adding a connection between multiple computing load in the newly developed architecture. It is found that, compared to the traditional Parameter Server architecture, our parallel architecture has higher efficiency on temporal action detection task with multiple GPUs, which is suitable for dealing with the tasks of temporal action proposal generation, especially for large datasets of millions of videos. We conduct experiments on ActivityNet-1.3 and THUMOS14, where our method outperforms other state-of-art temporal action detection methods with high recall and high temporal precision. In addition, a time metric is further proposed here to evaluate the speed performance in the distributed training process.
Read full abstract