Device-to-device (D2D) communication is an emerging technology in 5G and upcoming 6G networks due to its properties to enhanced spectral efficiency (SE), energy-efficiency (EE), and sum rate. Despite these advantages, co-channel and cross-channel interference, and ultra-massive connectivity are major issues which can deteriorate performance of any implemented solution in this environment. To address these issues, in this paper, we integrated the power domain non-orthogonal multiple access techniques (PD-NOMA) on the base station (BS). NOMA serves more than one user using the same resource block (RB) and reduces the effect of interference at CUs due to the presence of successive interference cancellation (SIC). The problem is formulated as a mixed-integer non-linear programming (MINLP) with associated resources and power constraints of the BS and DDPs with an aim to maximize the sum rate and fairness among the NOMA-enabled CUs and D2D pairs (DDPs). We firstly used the centralized deep deterministic policy gradient (DDPG) and arithmetic-geometric mean approximation (AGMA) technique to reduce cross-channel interference (CR-CI) and control the power. Then, to provide fairness to all the users, we transformed the proposed solution into distributed deep deterministic policy gradient (D3PG). Also, the successive convex approximation technique is then integrated into the D3PG to mitigate the effect of co-channel (CO-CI) interference among DDPs. The experimental results show that the proposed scheme has superior performance with respect to sum rate and fairness. Also, the results reveal that the proposed scheme has 21.05%, 34.21%, and 49.8% higher sum rate in comparison to DDPG, Deep dueling, and DQN scheduling.