To pursue profit from the dynamic, complex, and noisy stock markets, various efforts utilizing deep learning methods to forecast asset price movements have sprung up. We observe that there are two issues in the current work. Firstly, there exists a discrepancy between the forecasting target and actual profitability, as achieving better forecasting results does not necessarily guarantee higher profits. Secondly, many existing methods heavily rely on prior knowledge during the forecasting process, which entails the need for information gathering and may not adapt well to dynamic and complex market conditions. For the first issue, we design a novel reward learning loss function as an optimization object to better model the bridge between the forecasting target and the profit from the trading process. To solve the second issue, we propose a structure based on multi-head attention to model the inter-stock relations directly from trading data without relying on prior knowledge. Also, we present a simple time-asynchronous attention-based method to model the lead–lag phenomenon in the market. We conduct experiments using over 600 stocks of the CSI100, CSI300, and CSI500 indexes from 2010 to 2020 with five strong baselines. The experimental results demonstrate that our methods achieve annualized returns of 5%, 10%, and 13% for long and 5%, 6%, and 8% for short above the optimal baseline results on the three indexes. Further analysis shows that our RL-Loss is better than classic PR-Loss, and the inter-stock relation modeling methods proposed without prior knowledge are effective.