The attainment of a self-similar particle size distribution (PSD) is an interesting feature of many particulate processes such as milling. While it is well-known that the linear time-invariant population balance model (LTINV-PBM) admits a self-similar solution, the attainment of self-similarity is not well-understood within the context of non-linear breakage kinetics. Non-linear breakage kinetics emanating from mechanical multi-particle interactions is not uncommon in dense-phase dry milling processes and has been accounted for by the non-linear PBM (NL-PBM). Here, we investigate the dynamics and attainment of a self-similar PSD considering first-order (LTINV-PBM), non-uniform non-linear (NL-PBM Model B), and uniform non-linear (NL-PBM Model D) breakage kinetics. The PBMs were first fit to experimental data from the batch dry milling of quartz in a tumbling ball mill. Using the parameters estimated and simulating the impact of different extents of non-linear multi-particle interactions, we elucidated the impact of non-linear breakage kinetics on the evolution of the PSD to a self-similar one. In order to quantitatively assess self-similarity and thoroughly investigate the approach to a self-similar PSD during milling, we have introduced the use of the distance metric Canberra Distance. The simulations suggest that the retardation of the breakage of coarser particles due to their interactions with the finer particles can significantly delay the attainment of self-similarity and broaden the self-similar PSD attained. Both non-linear models predict attainment of self-similar PSDs, albeit slightly different, under cases of mild non-linear effects. However, when non-linear effects are severe, self-similarity can be delayed to prolonged milling in contrast to LTINV-PBM, which predicts the attainment of self-similarity shortly after the onset of milling. While LTINV-PBM and NL-PBMs can be easily discriminated via a single milling experiment, simulations suggest that only experiments utilizing various feed particle size distributions, preferentially including a bi-modal distribution, could discriminate between the non-uniform non-linear kinetics (Model B) and uniform non-linear kinetics (Model D).