Quantitative trading takes advantage of mathematical functions for automatically making stock or futures trading decisions. Specifically, various trading strategies that proposed by human-experts are associated with weight hyper-parameters to determine the probability of selecting a specific strategy according to market conditions. Prior work manually adjusting the weight hyper-parameters is error-prone, because the essential advantage of quantitative trading, i.e., automation, is lost. In this paper, we propose a dynamic parameter tuning algorithm, i.e., TradeBot, based on bandit learning for quantitative trading. We consider sequentially selecting hyper-parameters of rules for trading as a bandit game, where a set of hyper-parameters of trading rule is considered as an action. A novel reward-agnostic Upper Confidence Bound bandit method is proposed to solve the automatically trading problem with a reward function estimated by inverse reinforcement learning. Experimental results on China Commodity Futures Market Data show state-of-the-art performance. To our best knowledge, this is one of the first work deployed in the online trading system via reinforcement learning, in published literature.
Read full abstract