We study a relationship between optimal transport theory and stochastic thermodynamics for the Fokker-Planck equation. We show that the lower bound on the entropy production is the action measured by the path length of the $L^2$-Wasserstein distance. Because the $L^2$-Wasserstein distance is a geometric measure of optimal transport theory, our result implies a geometric interpretation of the entropy production. Based on this interpretation, we obtain a thermodynamic trade-off relation between transition time and the entropy production. This thermodynamic trade-off relation is regarded as a thermodynamic speed limit which gives a tighter bound of the entropy production. We also discuss stochastic thermodynamics for the subsystem and derive a lower bound on the partial entropy production as a generalization of the second law of information thermodynamics. Our formalism also provides a geometric picture of the optimal protocol to minimize the entropy production. We illustrate these results by the optimal stochastic heat engine and show a geometrical bound of the efficiency.