This paper investigates resource allocation in cognitive radio (CR) based mobile edge computing (MEC) systems under time-varying channels while considering non-orthogonal multiple access (NOMA) technique. In contrast to prior research on NOMA-MEC, we introduce a generalized user grouping scheme, which makes the secondary users (SUs) have the flexibility to participate in multiple NOMA groups simultaneously, enabling the partial offloading of their task data to multiple MEC servers. To tackle the non-convex long-term energy minimization problem, we propose a decomposition based soft actor–critic (DB-SAC) approach combining conventional convex optimization with deep reinforcement learning (DRL) algorithms, which divides the problem into a joint communication and computation resource allocation subproblem and an offloading ratios optimization subproblem. The former one is further divided into a series of local computation resource and offloading energy minimization problems, addressed through theoretical analysis and convex optimization techniques. For the latter subproblem, we adopt the SAC algorithm to dynamically allocate offloading ratios for SUs. Simulation results showcase that when compared to the conventional non-generalized user grouping scheme, the generalized user grouping scheme can achieve remarkable energy savings of up to 87.5%.