For the problem of joint beam selection and power allocation (JBSPA) for multiple target tracking (MTT), existing works tend to allocate resources only considering the MTT performance at the current tracking time instant. However, in this way, it cannot guarantee the long-term MTT performance in the future. If the JBSPA not only considers the tracking performance at the current tracking time instant but also at the future tracking time instant, the allocation results are theoretically able to enhance the long-term tracking performance and the robustness of tracking. Motivated by this, the JBSPA is formulated as a model-free Markov decision process (MDP) problem, and solved with a data-driven method in this article, i.e., deep reinforcement learning (DRL). With DRL, the optimal policy is given by learning from the massive interacting data of the DRL agent and environment. In addition, in order to ensure the information prediction performance of target state in maneuvering target scenarios, a data-driven method is developed based on Long-short term memory (LSTM) incorporating the Gaussian mixture model (GMM), which is called LSTM-GMM for short. This method can realize the state prediction by learning the regularity of nonlinear state transitions of maneuvering targets, where the GMM is used to describe the target motion uncertainty in LSTM. Simulation results have shown the effectiveness of the proposed method.