Baseball hitting, swatter swing and football catching, there are many tasks can be seen as a one-time action, whose goal is to control the timing and parameters of the action to achieve optimal results. Many one-time motion problems are difficult to obtain the optimal policy through model solving, and model-free reinforcement learning has advantages for such problems. However, although reinforcement learning has developed rapidly, there is currently no universal one-time motion problem algorithm architecture. Decomposing the one-time motion problem into the action timing problem and the action parameter problem, we construct a suitable reinforcement learning method for each of them. We design a combination mechanism that allows the two modules to learn simultaneously by passing the estimated value between the two modules while interacting with the environment. We use REINFORCE + DPG to solve the problem of continuous motion parameter space, and use REINFORCE + Q learning to solve the problem of discrete motion parameter space. To testing the algorithm model, we designed and realized an aircraft bombing simulation environment. The test results show that the algorithm can converge quickly and stably, and is robust to different time step and observation errors.