Growing awareness of global challenges and increasing pressures on the farming sector, including the urgent requirement to rapidly cut greenhouse gases (GHG) emissions, emphasize the need for sustainable production, which is particularly relevant for dairy production systems. Comparing dairy production systems across the three sustainability dimensions is a considerable challenge, notably due to the heterogeneity of production conditions in Europe. To overcome this, we developed an ex post multicriteria assessment tool that adopts a holistic approach across the three sustainability dimensions. This tool is based on the DEXi framework, which associates a hierarchical decision model with an expert perspective and follows a tree shaped structure; thus, we called it the DEXi-Dairy tool. For each dimension of sustainability, qualitative attributes were defined and organized in themes, sub-themes, and indicators. Their choice was guided by three objectives: (i) better describe main challenges faced by European dairy production systems, (ii) point out synergies and trade-offs across sustainability dimensions, and (iii) contribute to the identification of GHG mitigation strategies at the farm level. Qualitative scales for each theme, sub-theme, and indicator were defined together with weighting factors used to aggregate each level of the tree. Based on selected indicators, a list of farm data requirements was developed to populate the sustainability tree. The model was then tested on seven case study farms distributed across Europe. DEXi-Dairy presents a qualitative method that allows for the comparison of different inputs and the evaluation of the three sustainability dimensions in an integrated manner. By assessing synergies and trade-offs across sustainability dimensions, DEXi-Dairy is able to reflect the heterogeneity of dairy production systems. Results indicate that, while trade-offs occasionally exist among respective selected sub-themes, certain farming systems tend to achieve a higher sustainability score than others and hence could serve as benchmarks for further analyses.