Abstract
Automated data transfer and measurement between multiple devices are accomplished through Machine- to-machine (M2M) communications, which rely on zero or minimal human intervention. M2M communication offers a plethora of benefits and opportunities, including the ability to handle a wide range of data and large volumes, the ability to learn on their own, and better decision making. In spite of these advantages, M2M faces major challenges such as communication delay, data acquisition mismatching, the requirement of additional resources, and is highly susceptible to errors. To handle these challenges, in this work, we discuss various state-of-the-art deep reinforcement learning (DRL) algorithms. Deep Q-learning (DQN), dueling DQN, multi-step DQN, actor-critic (AC), advantage AC, REINFORCE, trustregion policy optimization (TRPO), and proximal policy optimization (PPO) algorithms are investigated.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have