Resource allocation is a significant task in the emerging area of Internet of Things (IoT). IoT devices are usually low-cost devices with limited computational power and capabilities for long term communication. In this article, the network function virtualization (NFV) technique is used to access resources of the network and a reinforcement learning (RL) algorithm is used to solve the problem of resource allocation in IoT networks. The traffic of the IoT network uses the substrate network which is available through NFV for its data transmission. The data transmission needs of the IoT network are translated to virtual requests and service function chain (SFC) are mapped to the substrate network to serve the requests. The problem of SFC placement while meeting the system constraints of the IoT network is a nonconvex problem. In the proposed deep RL (DRL)-based resource allocation, the virtual layer acts as a common repository of the network resources. The optimization problem of SFC placement under the system constraints of IoT networks can be formulated as a Markovian decision process (MDP). The MDP problem is solved through a multiagent DRL algorithm where each agent serves an SFC. Two $Q$ -networks are considered, where one $Q$ -network solves the SFC placement problem while the other updates weights of the $Q$ -network through keeping track of long-term policy changes. The virtual agents serving SFCs interact with the environment, receive reward collectively and update the policy by using the learned experiences. We show that the proposed scheme can solve the optimization problem of SFC placement through adequate reward design, state, and action space formulation. Simulation results demonstrate that the multiagent DRL scheme outperforms the reference schemes in terms of utility gained as measured through different network parameters.
Read full abstract