We are motivated by the need, in emergency situations, for impromptu (or “as-you-go”) deployment of multihop wireless networks, by human agents or robots (e.g., unmanned aerial vehicles (UAVs)); the agent moves along a line, makes wireless link quality measurements at regular intervals, and makes on-line placement decisions using these measurements. As a first step, we have formulated such deployment along a line as a sequential decision problem. In our earlier work, reported in [1] , we proposed two possible deployment approaches: (i) the pure as-you-go approach where the deployment agent can only move forward, and (ii) the explore-forward approach where the deployment agent explores a few successive steps and then selects the best relay placement location among them. The latter was shown to provide better performance (in terms of network cost, network performance, and power expenditure), but at the expense of more measurements and deployment time, which makes explore-forward impractical for quick deployment by an energy constrained agent such as a UAV. Further, since in emergency situations the terrain would be unknown, the deployment algorithm should not require a-priori knowledge of the parameters of the wireless propagation model. In [1] , we, therefore, developed learning algorithms for the explore-forward approach. The current paper fills in an important gap by providing deploy-and-learn algorithms for the pure as-you-go approach. We formulate the sequential relay deployment problem as an average cost Markov decision process (MDP), which trades off among power consumption, link outage probabilities, and the number of relay nodes in the deployed network. While the pure as-you-go deployment problem was previously formulated as a discounted cost MDP (see [1] ), the discounted cost MDP formulation was not amenable for learning algorithms that are proposed in this paper. In this paper, first we show structural results for the optimal policy corresponding to the average cost MDP, and provide new insights into the optimal policy. Next, by exploiting the special structure of the average cost optimality equation and by using the theory of asynchronous stochastic approximation (in single and two timescale), we develop two learning algorithms that asymptotically converge to the set of optimal policies as deployment progresses. Numerical results show reasonably fast speed of convergence, and hence the model-free algorithms can be useful for practical, fast deployment of emergency wireless networks.
Read full abstract