A selfish learner seeks to maximize their own success, disregarding others. When success is measured as payoff in a game played against another learner, mutual selfishness typically fails to produce the optimal outcome for a pair of individuals. However, learners often operate in populations, and each learner may have a limited duration of interaction with any other individual. Here, we compare selfish learning in stable pairs to selfish learning with stochastic encounters in a population. We study gradient-based optimization in repeated games like the prisoner’s dilemma, which feature multiple Nash equilibria, many of which are suboptimal. We find that myopic, selfish learning, when distributed in a population via ephemeral encounters, can reverse the dynamics that occur in stable pairs. In particular, when there is flexibility in partner choice, selfish learning in large populations can produce optimal payoffs in repeated social dilemmas. This result holds for the entire population, not just for a small subset of individuals. Furthermore, as the population size grows, the timescale to reach the optimal population payoff remains finite in the number of learning steps per individual. While it is not universally true that interacting with many partners in a population improves outcomes, this form of collective learning achieves optimality for several important classes of social dilemmas. We conclude that naïve learning can be surprisingly effective in populations of individuals navigating conflicts of interest.