Stochastic differential equations (SDEs) play a crucial role in various applications for modeling systems that have either random perturbations or chaotic dynamics at faster time scales. The time evolution of the probability distribution of a stochastic differential equation is described by the Fokker–Planck equation, which is a second order parabolic partial differential equation (PDE). Previous work combined artificial neural networks and Monte Carlo data to solve stationary Fokker–Planck equations. This paper extends this approach to time dependent Fokker–Planck equations. The main focus is on the investigation of algorithms for training a neural network that has multi-scale loss functions. Additionally, a new approach for collocation point sampling is proposed. A few 1D and 2D numerical examples are demonstrated.