AbstractIn cognitive radios, wideband sequential sensing plays an important role, which is able to quickly identify temporary available transmission opportunities by adaptively allocating sensing resources. This paper proposes a Markov decision process for modelling the optimal control of sequential sensing, which provides a general formulation capturing various practical features, including sampling cost, sensing requirement, sensing budget etc. For solving the optimal sensing policy, a model‐augmented deep reinforcement learning algorithm is proposed, which enjoys high learning stability and efficiency, compared to conventional reinforcement learning algorithms.