In this paper, we describe a novel iterative procedure called SISTA to learn the underlying cost in optimal transport problems. SISTA is a hybrid between two classical methods, coordinate descent ("S"-inkhorn) and proximal gradient descent ("ISTA"). It alternates between a phase of exact minimization over the transport potentials and a phase of proximal gradient descent over the parameters of the transport cost. We prove that this method converges linearly, and we illustrate on simulated examples that it is significantly faster than both coordinate descent and ISTA. We apply it to estimating a model of migration, which predicts the flow of migrants using country-specific characteristics and pairwise measures of dissimilarity between countries. This application demonstrates the effectiveness of machine learning in quantitative social sciences.