In this paper we present an approach to training neural network to generate sequences using successor feature learning from reinforcement learning. The model can be thought as two components, an MLE-based token generator and an estimator that predicts the future value of whole sentence. As we know, reinforcement learning has been applied to dealing with the exposure bias problem of generating sequences. Compared with other RL algorithm, successor feature(SF) can learn robust value function provided observations and reward by decomposing the value function into two components - a reward predictor and a successor map. The encoder-decoder framework with SF enables the decoder to generate outputs that receive more future reward, which means that the model pays attention on not only the current word but also the rest words. We demonstrate that the approach improves performance on two translation tasks.
Read full abstract