Random access schemes in satellite Internet-of-Things (IoT) networks are being considered a key technology of new-type machine-to-machine (M2M) communications. However, the complicated situations and long-distance transmission can make the current random access schemes not suitable for the satellite IoT networks. The random access problem in the satellite IoT networks is studied in this article. A novel random access scheme for machine-type-communication devices (MTCDs) is proposed, to maximize the efficiency of random access for contention-based and contention-free random access. Under the set of random access opportunities (RAOs) and limited delay, the random access control model is designed via maximizing efficiency of random access. The model-free deep reinforcement learning (DRL) algorithm is proposed to tackle the problem based on the random access model. Subsequently, the deep Dyna- <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula> learning algorithm is introduced to deal with the proposed random access control model. In this proposed scheme, the random access model-free DRL algorithm is developed using simulated experience. The proposed algorithms’ performances are discussed, and simulation results show the desirable performance of the proposed DRL methods on different system parameters.