Benefiting by the fast development of nanotechnology, molecular communication (MC) has received great attention in recent years. In many potential applications of MC, such as drug delivery and pollution prevention, it is essential to locate or trace the target. In this paper, we consider a 3D diffusive MC environment consisting of several obstacles, a molecule-releasing source (RS) and a mobile molecule sensor (MS) which aims to find the RS within a time constraint. The problem is reformulated using Markov Decision Process (MDP) and an adaptive multi-layer reward based Q-Learning (AMR-Q Learning) approach is proposed. By exploiting information from the number of received molecules and adaptively setting multi-layer rewards, MS with AMR-Q Learning can find the RS efficiently, unlike the gradient based method which is usually trapped in locally optimal points. Numerical results demonstrate that the proposed AMR-Q Learning approach outperforms existing path-planning schemes with significantly reduced training overhead.
Read full abstract