Abstract

Non-orthogonal multiple access (NOMA) exploits the potential of the power domain to enhance the connectivity for the Internet of Things (IoT). Due to time-varying communication channels, dynamic user clustering is a promising method to increase the throughput of NOMA-IoT networks. This article develops an intelligent resource allocation scheme for uplink NOMA-IoT communications. To maximise the average performance of sum rates, this work designs an efficient optimization approach based on two reinforcement learning algorithms, namely deep reinforcement learning (DRL) and SARSA-learning. For light traffic, SARSA-learning is used to explore the safest resource allocation policy with low cost. For heavy traffic, DRL is used to handle traffic-introduced huge variables. With the aid of the considered approach, this work addresses two main problems of fair resource allocation in NOMA techniques: 1) allocating users dynamically and 2) balancing resource blocks and network traffic. We analytically demonstrate that the rate of convergence is inversely proportional to network sizes. Numerical results show that: 1) Compared with the optimal benchmark scheme, the proposed DRL and SARSA-learning algorithms have lower complexity with acceptable accuracy and 2) NOMA-enabled IoT networks outperform the conventional orthogonal multiple access based IoT networks in terms of system throughput.

Highlights

  • I NTERNET of things (IoT) enable millions of devices to communicate simultaneously

  • We show that: 1) according to the time-varying environment, resources can be assigned dynamically to IoT users based on our proposed framework; 2) for the proposed model, the learning rate α = 0.75 provides the best convergence and data rates; 3) for SARSA and deep reinforcement learning (DRL) the sum-rate is proportional to the number of users; 4) DRL with the ReLU activation mechanism is more efficient than TanH and Sigmoid, and 5) IoT networks with non-orthogonal multiple access (NOMA) provide better system throughput than those with orthogonal multiple access (OMA)

  • The proposed multi-constrained algorithms are tested under different network settings to solve: 3D associations among user, base stations (BSs), and sub-channels as well as sum-rate optimization with different network traffic

Read more

Summary

INTRODUCTION

I NTERNET of things (IoT) enable millions of devices to communicate simultaneously. It is predicted that the number of IoT devices will rapidly increase in the decades [2]. Various model-based schemes have been proposed to improve different metrics of NOMA-IoT networks, such as coverage performance, energy efficiency, system throughput (sum-rates), etc. The sumrate is widely used as a significant performance indicator for wireless networks by numerous research works [7], [8] It shows the significance of the sum-rate maximization based objective functions. Numerous model-based techniques target to solve dynamic behaviour of wireless networks but failed to provide long-term performance outcomes [9]–[12] and [13]. Due to the absence of learning abilities, to provide long term network stability the computational complexity of traditional schemes becomes ultra-high This is due to the fact that, by default, traditional approaches cannot extract knowledge from any given problem (e.g, given distributions) online. The online learning properties of recently developed machine learning (ML) methods are extremely suitable to handle such type of dynamic problems [14]

Related Works and Motivations
Contributions and Organization
SYSTEM MODEL
NOMA Clusters
Signal Model
Problem Formulation
Markov Decision Process Model for Uplink NOMA
14: Update π towards greediness
1: Inputs for DRL: 2: Initialization for DRL
Complexity
NUMERICAL RESULTS
Convergence vs Sum Rate vs Traffic Density
DQN Loss vs Rewards
Clustering Time
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call