Mobile crowdsensing (MCS) is a cost-effective paradigm for gathering real-time and location-related urban sensing data. To complete MCS tasks, MCS platform needs to exploit the trajectory of participants (vehicles or individuals, etc.) for effectively choosing participants. On one hand, the existing works usually assume that platform has possessed the abundant historical movement trajectory for participant selection, or can accurately predict the movement of participant before selection, but this assumption is impractical for many MCS applications, for some candidates have just arrived without sufficient mobility profiles, so-called trajectory from-scratch, or cold-trajectory issue. On the other hand, most of works only considers the coverage ratio of the sensing area, while some hotspots should be sensed frequently, so-called coverage degree of hotspots. To solve the issue, this paper proposes a reinforcement learning (RL) based, i.e., an improved Q-learning based online participant selection scheme to incorporate both coverage ratio and degree, PSARE. First, to solve the explosion of state-value table in traditional tabular Q-learning, an improved two-level Q-learning method is proposed to select participants in online way so as to achieve high long-term return. Specifically, in each selection round, PSARE dynamically compresses all the real participants (RPs) into several virtual participants (VPs) using the available historical trajectories of RPs, and the VP-based state-value table is constructed and constantly updated (i.e., the first level). Then, after selecting the VP through looking up the table, PARSE chooses the RP with the largest expected reward in this VP using epsilon-greedy way to balance the effect of exploration and exploitation (i.e., the second level). Moreover, the reward function is designed to measure the MCS coverage quality, including both coverage degree of hotspots and coverage ratio of target area. Thorough experiments on real-world mobility data set demonstrate that PSARE outperforms than other RL based online participant selection schemes (including deep Q-learning network) and traditional offline selection methods.
Read full abstract