Abstract
Multi-task learning provides plenty of room for performance improvement to single-task learning, when learned tasks are related and learned with mutual information. In this work, we analyze the efficiency of using a single-task reinforcement learning algorithm to mitigate jamming attacks with frequency hopping strategy. Our findings show that single-task learning implementations do not always guarantee optimal cumulative reward when some jammer’s parameters are unknown, notably the jamming time-slot length in this case. Therefore, to maximize packet transmission in the presence of a jammer whose parameters are unknown, we propose deep multi-task conditional and sequential learning (DMCSL), a multi-task learning algorithm that builds a transition policy to optimize conditional and sequential tasks. For the anti-jamming system, the proposed model learns two tasks: sensing time and transmission channel selection. DMCSL is a composite of the state-of-the-art reinforcement learning algorithms, multi-armed bandit and an extended deep-Q-network. To improve the chance of convergence and optimal cumulative reward of the algorithm, we also propose a continuous action-space update algorithm for sensing time action-space. The simulation results show that DMCSL guarantees better performance than single-task learning by relying on a logarithmically increased action-space sample. Against a random dynamic jamming time-slot, DMCSL achieves about three times better cumulative reward, and against a periodic dynamic jamming time-slot, it improves by 10% the cumulative reward.
Highlights
I N wireless communication, network interference happens when nearby communicating nodes transmit at the same time with closer frequencies, resulting in a jamming attack if done intentionally
By following a decaying1, for each iteration, the agent explores the transmission in the environment to learn the jammer activities or exploits the transmission in the environment with the higher expected reward based on previously computed statistics, solved by calculating the optimal state-action value function using deep Q-network (DQN)
The analysis showed that designing anti-jamming learning systems as single-task learning for the transmission channel selection does not always guarantee optimal performance in the long run if the sensing time used is not optimal
Summary
I N wireless communication, network interference happens when nearby communicating nodes transmit at the same time with closer frequencies, resulting in a jamming attack if done intentionally. We formulate anti-jamming as single-task learning in which the transmitter agent interacts with the environment (made of receiver, jammer, and other transmitter nodes) in a sequence of state S, action A, and reward r. By following a decaying , for each iteration, the agent explores the transmission in the environment to learn the jammer activities or exploits the transmission in the environment with the higher expected reward based on previously computed statistics, solved by calculating the optimal state-action value function using deep Q-network (DQN) Both the jammer and transmitter abide by the Assumption II.. When a transmitter agent does not know the jammer’s internal working details, solving jamming attacks as a single-task RL problem, which only learns about channel hopping policy, without optimizing sensing time, does not always guarantee the optimal accumulated reward, especially against a dynamic jamming time-slot.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.