The problem of fair machine learning has drawn much attention over the last few years and the bulk of offered solutions are, in principle, empirical. However, algorithmic fairness also raises important conceptual issues that would fail to be addressed if one relies entirely on empirical considerations. Herein, I will argue that the current debate has developed an empirical framework that has brought important contributions to the development of algorithmic decision-making, such as new techniques to discover and prevent discrimination, additional assessment criteria, and analyses of the interaction between fairness and predictive accuracy. However, the same framework has also suggested higher-order issues regarding the translation of fairness into metrics and quantifiable trade-offs. Although the (empirical) tools which have been developed so far are essential to address discrimination encoded in data and algorithms, their integration into society elicits key (conceptual) questions such as: What kind of assumptions and decisions underlies the empirical framework? How do the results of the empirical approach penetrate public debate? What kind of reflection and deliberation should stakeholders have over available fairness metrics? I will outline the empirical approach to fair machine learning, i.e. how the problem is framed and addressed, and suggest that there are important non-empirical issues that should be tackled. While this work will focus on the problem of algorithmic fairness, the lesson can extend to other conceptual problems in the analysis of algorithmic decision-making such as privacy and explainability.