Urban planners, authorities, and numerous additional players have to deal with challenges related to the rapid urbanization process and its effect on human mobility and transport dynamics. Hence, optimize transportation systems represents a unique occasion for municipalities. Indeed, the quality of transport is linked to economic growth, and by decreasing traffic congestion, the life quality of the inhabitants is drastically enhanced. Most state-of-the-art solutions optimize traffic in specific and small zones of cities (e.g., single intersections) and cannot be used to gather insights for an entire city. Moreover, evaluating such optimized policies in a realistic way that is convincing for policy-makers can be extremely expensive. In our work, we propose a reinforcement learning frameworks to overtake these two limitations. In particular, we use human mobility data to optimize the transport dynamics of three real-world cities (i.e., Berlin, Santiago de Chile, Dakar) and a synthesized one (i.e., SynthTown). To this end, we transform the transportation dynamics’ simulator MATSim into a realistic reinforcement learning environment able to optimize and evaluate transportation policies using agents that perform realistic daily activities and trips. In this way, we can assess transportation policies in a manner that is convincing for policy-makers. Finally, we develop a model-based reinforcement learning algorithm that approximates MATSim dynamics with a Partially Observable Discrete Event Decision Process (PODEDP) and, with respect to other state-of-art policy optimization techniques, can scale on big transportation data and find optimal policies also on a city-scale.