Value Iteration Residual Network with Self-attention

Jinyu Cai,Kenji Tei,Jialong Li,Zhenyu Mao

doi:10.1007/978-3-031-35501-1_2

Abstract

The Value Iteration Network (VIN) is a neural network widely used in path-finding reinforcement learning problems. The planning module in VIN enables the network to understand the nature of a problem, thus giving the network an impressive generalization ability. However, reinforcement learning (RL) with VIN can not guarantee efficient training due to the network depth and max-pooling operation. A great network depth makes it harder for the network to learn from samples when using gradient descent algorithms. The max-pooling operation may increase the difficulty of learning negative rewards due to overestimation. This paper proposes a new neural network, Value Iteration Residual Network (VIRN) with Self-Attention, using a unique spatial self-attention module and aggressive iteration to solve the above-mentioned problems. A preliminary evaluation using Mr. Pac-Man demonstrated that VIRN effectively improved the training efficiency compared with VIN.

Full Text