In this paper, we consider a millimeter wave multiple-input single-output tracking system, where the time-varying angle of departure (AoD) is assumed to change following a discrete state Markov process. Depending on whether the associated AoD transition function is available or not, we propose two different training beam sequence design approaches. Specifically, in the case when the AoD transition function is available, we leverage the maximum a posteriori criterion to estimate the updated AoD in each beam tracking period. Since it is infeasible to derive an explicit expression for the resultant estimation error rate, we turn to its upper bound, which possesses a closed-form expression and is therefore used as the objective function to optimize the training beam sequence. Considering the complicated objective function and the unit modulus constraints imposed by the analog phase shifters, we resort to a particle swarm algorithm to solve the formulated optimization problem. In the case when the AoD transition function is unavailable, we turn to the maximum likelihood criterion for AoD estimation. To cope with the unknown AoD transition function, we reformulate the beam tracking problem as a partially observable Markov decision process problem and develop an actor-critic reinforcement learning framework to obtain an efficient training beam sequence design. Numerical results demonstrate superiorities of the proposed training beam sequence design approaches for both two cases.