This paper studies packet repetition strategies over erasure channels with memory and a long feedback delay. The problem is initially formulated as a communications problem where a source wishes to transmit one message packet to a destination while minimizing both the delay and the number of transmissions. At each time instant, the sender is provided a delayed acknowledgement feedback about past attempts, and must decide whether to attempt a new transmission or not. This problem is then re-formulated as an episodic reinforcement learning problem, where an agent attempts to learn the optimal transmission policy, provided delayed feedback about past transmission attempts. The agent is helped by a channel estimator, which attempts to capture the channel memory and use that to predict probabilities of erasures in a future window. This channel estimator is also data-driven and learns the channel model without any <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">a priori</i> channel knowledge. The paper presents a lower bound on the achievable trade-off between delay and number of transmissions for any channel modeled as a Markov process. Experimental results show that the combination of the proposed channel estimator and the agent can noticeably outperform naive strategies for channels with memory, and achieves results close to the lower bound.
Read full abstract