Abstract
As a technology that can accommodate more users and significantly improve spectral efficiency, non-orthogonal multiple access (NOMA) has attracted the attention of many scholars in recent years. The basic idea of NOMA is to implement multiple access in the power domain and decode the desired signal via successive interference cancellation (SIC). However, the resource allocation problem in such NOMA system is non-convex. It is difficult to directly solve this optimization problem through conventional methods. As such, we propose to apply a reinforcement learning (RL) approach based on cooperative Q-learning to solve the resource allocation problem in multi-antenna downlink NOMA systems. First, we formulate the resource allocation process as a sum rate maximization problem, subject to the power budget constraints and quality of service (QoS) condition. Second, we design a reward function to improve the sum rate while meeting the power and capacity constraints. Multiple Q-tables are created and cooperatively updated to get the optimal beamforming matrix. Then, we analyze the convergence of our proposed RL based power allocation method. Our simulations show that the proposed power allocation scheme yields excellent performance in terms of sum rate, energy efficiency, and spectral efficiency.
Highlights
The development of mobile internet and internet of things (IoT) has put forward challenging requirements for the fifth-generation wireless communication system (5G), which is expected to achieve higher spectral efficiency and lower latency [1], [2]
In this paper we consider a downlink multiple-input singleoutput (MISO) non-orthogonal multiple access (NOMA) communication scenario consisting of one base station and N users, where all the users are randomly distributed at different distances from the BS and equipped with single antenna
Initialization: create Q-tables for each antenna; set the total iterations Itot and error threshold ζ ; define state space and action space. while error ≥ ζ and episodes ≤ Itot do initial state s; for all steps of episode do for all Q-tables do choose a from the action space based on -greedy algorithm; end perform action a and calculate the sum rate Csum of the system; for all Q-tables do measure the reward R and new state s ; update Q(s, a) according to the updating rule; let s = s ; end end end the SRMax method (SRPA) proposed in [33] as benchmarks
Summary
The development of mobile internet and internet of things (IoT) has put forward challenging requirements for the fifth-generation wireless communication system (5G), which is expected to achieve higher spectral efficiency and lower latency [1], [2]. B. RELATED WORKS At present, the investigation on resource allocation in NOMA communication system has made certain achievements in many aspects, such as user pairing [15], [16], channel assignment, and power allocation [17], [18]. The problem of user association and channel assignment in downlink multi-cell NOMA networks is solved in [19], the authors propose a low-complexity iterative solution to obtain the optimal power allocation, while accounting for inter-user interference and maintaining QoS per user. In [20], authors predefine some power allocation schemes and investigate the user pairing problem in NOMA uplink communication system. To the best of our knowledge, there is currently no research on solving the power allocation problem of multiantenna NOMA system through cooperative Q-learning algorithm. Unlike that in single-antenna NOMA systems, channel gains in the MISO NOMA systems are jointly determined by multiple antennas between the BS and users
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.