Next generation networks have special areas related with the Internet of Things (IoT) to improve the performance of cellular networks in terms of throughput. Grant-free non-orthogonal multiple access (GF-NOMA) seems a feasible solution, letting machine type communication (MTC) devices transmits their packets when they ready to transmit. GF-NOMA increases the spectral efficiency by using the superimposing signals with different power levels over the same time and frequency resources. However, the main drawbacks of GF-NOMA are randomness and the management of power level selection of MTC devices. In 6G-IoT networks, the intelligence should be met to random access. It is time to design new access methods to solve the GF-NOMA issues that should be between the randomness and fully coordinated medium access. Deep-Q-Network (DQN) has become a very hot research topic in recent years that let the MTC devices to make a smart decision in an intelligent way to improve the throughput. Selfishness is an undesirable behavior of DQN for GF-NOMA system where the resources have different cost. In this study, we develop a novel learning framework for power domain GF-NOMA. The goal of our learning framework is to maximize the throughput considering fairness in power consumption which provides long-life to the IoT network. The learning algorithm push the MTC devices to exchange the resources between each other over time. The results show that the proposed method outperform the NOMA scheme with random selection in terms of throughput and increase the fairness index when the DQN with selfish behavior is employed.