This work investigates a somewhat different point of view on Markov decision processes by reinterpreting them as a randomized shortest paths problem on a bipartite graph, therefore establishing bridges with entropy-regularized reinforcement learning. The graph structure contains the set of states as “left” nodes and the set of actions as “right” nodes. In that context, the action-to-state transition probabilities are provided by the environment whereas the state-to-action probabilities correspond to the (stochastic) policy to be found. The randomized shortest paths formalism (minimizing expected cost to the goal state subject to (Shannon or Tsallis) relative entropy regularization) is then readily applied to this bipartite structure, providing a possibly sparse stochastic policy interpolating between a least-cost and a purely random policy. The algorithm computing the policy is closely related to the dual linear programming formulation of the Markov decision processes to which the relative entropy regularization term, multiplied by a scaling factor balancing exploitation and exploration (the temperature), is added. It is derived from well-known techniques of discrete optimal control, relying on costates (Lagrange parameters) backward computation. In summary, the proposed algorithm allows the design of optimal stochastic – but still sparse – policies, ranging from a purely rational to a random behavior, depending on the temperature parameter.