In this paper, a joint non-orthogonal multiple access (NOMA) clustering and power allocation problem is studied to maximize the uplink total sum rate in Internet of Remote Things (IoRT)-oriented satellite terrestrial relay networks (STRNs). The joint optimization problem is a mixed-integer programming (MIP) problem, which is non-convex and NP-hard. To solve this problem, we decompose it into two subproblems and propose staged algorithms to solve them. The first subproblem is an optimal NOMA clustering problem, which is still non-convex and NP-hard. To solve the subproblem efficiently, a reinforcement learning-based dynamic clustering algorithm (RL-DCA) is proposed. Using the RL-DCA, the users are able to gradually learn the optimal NOMA clustering policy in a distributed fashion. The RL-DCA has fast convergence and its computational complexity at each user remains fixed for any network size. The second subproblem is an intra-cluster optimal NOMA power allocation problem, which is still non-convex. To solve it, we firstly convert it into a convex problem by constrait approximation, then use the Karush-Kuhn-Tucker (KKT) conditions based algorithm to find the optimal power allocation policy. The second subproblem is solved in an offline fashion, which enables low computational complexity in transmission. Simulation results show that: 1) The RL-DCA has fast convergence and enables good clustering performance; 2) The joint NOMA clustering and power allocation scheme greatly outperforms the orthogonal multiple access (OMA) scheme and other NOMA schemes in terms of total sum rate.