As a crucial financial task, portfolio selection has attracted substantial interest within the artificial intelligence community. Reinforcement learning is particularly well-suited for portfolio optimization. Traditional reinforcement learning typically depends on the implicit estimation of future states due to the unknown nature of state transitions. In financial data contexts, where trading actions are assumed not to influence asset prices, transition probabilities are predetermined. This allows us to formalize the portfolio optimization process using reinforcement learning into two main tasks: prediction and profit policy optimization. Considering these factors, we propose a novel reinforcement learning framework with deterministic state transition probabilities, comprising three modules: feature extraction, prediction, and profit policy. To model assets more effectively and robustly, we capture their temporal features, relational features, and market state. We introduce a patch-wise correlation method and attribute based gate to enhance feature extraction. In the profit policy module, we utilize a deterministic strategy, employing a recursive reinforcement learning method based on Monte Carlo sampling to train the policy network. This enables dynamic adjustments of asset investment weights, ensuring the maximization of cumulative returns. Extensive experiments conducted on cryptocurrency datasets demonstrate the superior performance of our approach, and achieving 36.6%-75.6% improvements in main measurements on cryptocurrency datasets.