ABSTRACT The abundance of dark matter haloes is a key cosmological probe in forthcoming galaxy surveys. The theoretical understanding of the halo mass function (HMF) is limited by our incomplete knowledge of the origin of non-universality and its cosmological parameter dependence. We present a deep-learning model which compresses the linear matter power spectrum into three independent factors which are necessary and sufficient to describe the $z=0$ HMF from the state-of-the-art Aemulus emulator to sub-per cent accuracy in a wCDM$+N_\mathrm{eff}$ parameter space. Additional information about growth history does not improve the accuracy of HMF predictions if the matter power spectrum is already provided as input, because required aspects of the former can be inferred from the latter. The three factors carry information about the universal and non-universal aspects of the HMF, which we interrogate via the information-theoretic measure of mutual information. We find that non-universality is captured by recent growth history after matter-dark-energy equality and $N_{\rm eff}$ for $M\sim 10^{13} \, \mathrm{M_\odot }\, h^{-1}$ haloes, and by $\Omega _{\rm m}$ for $M\sim 10^{15} \, \mathrm{M_\odot }\, h^{-1}$. The compact representation learnt by our model can inform the design of emulator training sets to achieve high emulator accuracy with fewer simulations.