Abstract

In this article, we propose an energy-efficient architecture, which is designed to receive both images and text inputs as a step toward designing reinforcement learning agents that can understand human language and act in real-world environments. We evaluate our proposed method on three different software environments and a low power drone named Crazyflie to navigate toward specified goals and avoid obstacles successfully. To find the most efficient language-guided reinforcement learning model, we implemented the model with various configurations of image input sizes and text instruction sizes on the Crazyflie drone GAP8, which consists of eight reduced instruction set computer-V cores. The task completion success rate and onboard power consumption, latency, and memory usage of GAP8 are measured and compared with Jetson TX2 ARM central processing unit and Raspberry Pi 4. The results show that by decreasing 20% of input image size we achieve up to 78% energy improvement while achieving an 82% task completion success rate.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call