In this paper, we consider a remote estimation problem where multiple dynamical systems are observed by smart sensors, which transmit their local estimates to a remote estimator over channels prone to packet losses. Unlike previous works, we allow multiple sensors to transmit simultaneously even though they can cause interference, thanks to the multi-packet reception capability at the remote estimator. In this setting, the remote estimator can decode multiple sensor transmissions (successful packet arrivals) as long as their signal-to-interference-and-noise ratios (SINR) are above a certain threshold. In this setting, we address the problem of optimal sensor transmission scheduling by minimizing a finite horizon discounted expected estimation error covariance cost across all systems at the remote estimator, subject to an average transmission cost. While this problem can be posed as a stochastic control problem, the optimal solution requires solving a Bellman equation for a dynamic programming (DP) problem, the complexity of which scales exponentially with the number of systems being measured and their state dimensions. In this paper, we resort to a novel Least Squares Temporal Difference (LSTD) Approximate Dynamic Programming (ADP) based approach to approximating the value function. More specifically, an off-policy based LSTD approach, named in short Enhanced-Exploration Greedy LSTD (EG-LSTD), is proposed. We discuss the convergence analysis of the EG-LSTD algorithm and its implementation. A Python based program is developed to implement and analyse the different aspects of the proposed method. Simulation examples are presented to support the results of the proposed approach both for the exact DP and ADP cases.
Read full abstract