AbstractThe world of communications technology has recently undergone an extremely significant revolution. This revolution is an immediate consequence of the immersion that the fifth generation B5G and 6G have just brought. The latter responds to the growing need for connectivity and it improves the speeds and qualities of the mobile connection. To improve the energy and spectral efficiency of these types of networks, the non‐orthogonal multiple access (NOMA) technique is seen as the key solution that can accommodate more users and dramatically improve spectrum efficiency. The basic idea of NOMA is to achieve multiple access in the power sector and decode the required signal via continuous interference cancelation. A resource allocation approach is proposed for the B5G/6G‐NOMA network that aims to maximise system throughput, spectrum and energy efficiency and fairness among users while minimising latency. The proposed approach is based on reinforcement learning (RL) with the use of the Q‐Learning algorithm. First, the process of resource allocation as a problem of maximising rewards is formulated. Next, the Q‐Learning algorithm is used to design a resource allocation algorithm based on RL. The results of the simulation confirm that the proposed scheme is feasible and efficient.