In this paper, a novel parameter estimation method based on a two-stage neural network (PENN) is proposed to carry out a joint estimation of a parameterized stochastic differential equation (SDE) driven by Lévy noise from a discretely sampled trajectory. The first stage is a long short term memory neural network to extract the compact time-irrelevant deep features from the trajectory. Then a fully connected neural network refines the deep features by integrating the information of time. This neural network architecture allows our method capable of processing trajectories with variable lengths and time spans. Representative SDEs including Ornstein–Uhlenbeck process, genetic toggle switch model and bistable Duffing system are presented to determine the effectiveness of our approach. The numerical results suggest that the PENN can simultaneously estimate the parameters of the system and Lévy noise with faster speed and higher accuracy in comparison with traditional estimation methods. Moreover, the method can be easily generalized to different SDEs with flexible settings of sample observation.