Abstract

This paper aims to solve the optimization problems in far-field wireless power transfer systems using deep reinforcement learning techniques. The Radio-Frequency (RF) wireless transmitter is mounted on a mobile robot, which patrols near the harvested energy-enabled Internet of Things (IoT) devices. The wireless transmitter intends to continuously cruise on the designated path in order to fairly charge all the stationary IoT devices in the shortest time. The Deep Q-Network (DQN) algorithm is applied to determine the optimal path for the robot to cruise on. When the number of IoT devices increases, the traditional DQN cannot converge to a closed-loop path or achieve the maximum reward. In order to solve these problems, an area division Deep Q-Network (AD-DQN) is invented. The algorithm can intelligently divide the complete charging field into several areas. In each area, the DQN algorithm is utilized to calculate the optimal path. After that, the segmented paths are combined to create a closed-loop path for the robot to cruise on, which can enable the robot to continuously charge all the IoT devices in the shortest time. The numerical results prove the superiority of the AD-DQN in optimizing the proposed wireless power transfer system.

Highlights

  • Academic Editor: Narushan Pillay is paper aims to solve the optimization problems in far-field wireless power transfer systems using deep reinforcement learning techniques. e Radio-Frequency (RF) wireless transmitter is mounted on a mobile robot, which patrols near the harvested energy-enabled Internet of ings (IoT) devices. e wireless transmitter intends to continuously cruise on the designated path in order to fairly charge all the stationary IoT devices in the shortest time. e Deep Q-Network (DQN) algorithm is applied to determine the optimal path for the robot to cruise on

  • In far-field wireless power transfer, the IoT devices use the electromagnetic waves from transmitters as the power resource and the effective charging distance ranges from 50 centimeters to 1.5 meters [3–5]

  • As the distance between the transceivers increases to 1.5 meters, the amount of the harvested energy is less than 5 milliwatts, which is still not Wireless Power Transfer ideal to power up the high-energy-consuming devices

Read more

Summary

Problem Formulation

The mobile wireless charging problem can be formulated as minimizing the time duration T for the robot to complete running one loop at the same time the robot has to pass through one of the effective areas of each IoT device. E reward of the MDP is denoted as w(s, a, s′), which is defined for the condition that system state transits from s to state s′. E optimization problem is formulated as reaching sT in the fewest transmission time slots; the reward has to be defined to motivate the mobile robot that. Accok−1 1 if the robot has already passed through as effective area of the ok−1th IoT device; and ζ denotes the unit price of the harvested energy. At each system state s, we derive the best action a∗(s) which can generate the maximum reward. e optimal policy sets are defined as π {a(s): s ∈ S}

Optimal Path Planning with Reinforcement Learning
Dueling Double DQN
Area Division Deep Reinforcement Learning
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call