A smart grid integrates advanced sensors, efficient measurement methods, progressive control technologies, and other techniques and devices to achieve safe, efficient and economical operation of the grid system. However, the diversified and open environment of a smart grid makes energy and information of the smart grid vulnerable to malicious attacks. As a representative cyber-physical attack, the data integrity attack has an extremely severe impact on the grid operation for it can bypass the traditional detection mechanisms by adjusting the attack vector. In this paper, we first present the attack strategy against dynamic state estimation of power grid in the perspective of adversary and formulate the data integrity attack detection problem that has the characteristic of sequential decision making as a partially observable Markov decision process. Then, a deep reinforcement learning-based approach is proposed to detect against data integrity attacks, which utilizes the Long Short-Term Memory layer to extract the state features of previous time steps in determining whether the system is currently under attack. Moreover, the noisy networks are employed to ensure effective agent exploration, which prevents the agent from sticking to the non-optimal policy. The principle of a multi-step learning is adopted to increase the estimation accuracy of Q value. To address the sparse rewards problem, the prioritized experience replay is proposed to increase training efficiency. Simulation results demonstrated that the proposed detection approach surpasses the benchmarks in the comparison metrics: delay error rate and false rate. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Note to Practitioners</i> —In this paper, we present a deep reinforcement learning-based algorithm to defend against the data integrity attacks of smart grid. Most of the previous works discretized the system states and utilized the current state information to identify whether the system is under attack. For this reason, the detection policy may totally ignored the continuously changing characteristics of the grid states, which will lead to poor detection performance. Moreover, the attacked system states only accounts for a small part of the entire grid operation states, the probability of sampling the experience containing the attack state is extremely small, which limits the learning efficiency of previous RL-based detection approaches. In order to increase the accuracy of detection, we first present the attack strategy against power grid’s dynamic state estimation in the perspective of adversary and formulate the partially observable Markov decision process model of attack detection problem. Moreover, we propose a deep reinforcement learning-based detection approach combining the LSTM network to extract the system state features of the previous time steps to determine whether the system is currently being attacked. To address the sparse rewards problem, the prioritized experience replay is used to increase learning efficiency. The experiments demonstrate the effectiveness of proposed detection scheme compared with benchmarks in terms of detection delay as well as accuracy. In conclusion, the proposed detection scheme is helpful in defending against the data integrity attacks without obtaining the opponent’s strategy in advance and can be conveniently applied to the real-world security management system of smart grid.