Simultaneous multithreading (SMT) improves the performance of superscalar CPUs by exploiting thread-level parallelism with shared entries for better utilization of resources. A key issue for this out-of-order execution is that the occupancy latency of a physical rename register can be undesirably long due to many program execution-dependent factors that result in performance degradation. Such an issue becomes even more problematic in an SMT environment in which these registers are shared among concurrently running threads. Smartly managing this critical shared resource to ensure that slower threads do not block faster threads’ execution is essential to the advancement of SMT performance. In this paper, an actor–critic style reinforcement learning (RL) algorithm is proposed to dynamically assigning an upper-bound (cap) of the rename registers any thread is allowed to use according to the threads’ real-time demand. In particular, a critic network projects the current Issue Queues (IQ) usage, register file usage, and the cap value to a reward; an actor network is trained to project the current IQ usage and register file usage to the optimal real-time cap value via ascending the instructions per cycle (IPC) gradient within the trajectory distribution. The proposed method differs from the state-of-the-art (Wang and Lin, 2018) as the cap for the rename registers for each thread is adjusted in real-time according to the policy and state transition from self-play. The proposed method shows an improvement in IPC up to 162.8% in a 4-threaded system, 154.8% in a 6-threaded system and up to 101.7% in an 8-threaded system. The code is now available open source at https://github.com/98k-bot/RL-based-SMT-Register-Renaming-Policy.