5G wireless networks require highly spectral-efficient multiple access techniques, which play an important role in determining the performance of mobile communication systems. Multiple access techniques can be classified into orthogonal and nonorthogonal based on the way the resources are allocated to the users. This paper investigates the joint user association and resource allocation problem in an uplink multicast NOMA system to maximize the power efficiency with guaranteeing the quality-of-experience of all subscribers. We also introduce an adaptive load balancing approach that aspires to obtain “almost optimal” fairness among servers from the quality of service (QoS) perspective in which learning automata (LA) has been used to find the optimal solution for this dynamic problem. This approach contains a sophisticated learning automata which consists of time-separation and the “artificial” ergodic paradigms. Different from conventional model-based resource allocation methods, this paper suggested a hierarchical reinforcement learning based frameworks to solve this non-convex and dynamic power optimization problem, referred to as hierarchical deep learning-based resource allocation framework. The entire resource allocation policies of this framework are adjusted by updating the weights of their neural networks according to feedback of the system. The presented learning automata find the $$\epsilon$$ -optimal solution for the problem by resorting to a two-time scale-based SLA paradigm. Numerical results show that the suggested hierarchical resource allocation framework in combination with the load balancing approach, can significantly improve the energy efficiency of the whole NOMA system compared with other approaches.