Aims. We critically analysed the theoretical foundation and statistical reliability of the mixing-length calibration by means of standard (Teff, [Fe/H]) and global asteroseismic observables (Δν,νmax) of field stars. We also discussed the soundness of inferring a possible metallicity dependence of the mixing-length parameter from field stars.Methods. We followed a theoretical approach based on mock datasets of artificial stars sampled from a grid of stellar models with a fixed mixing-length parameterαml. We then recovered the mixing-length parameter of the mock stars by means of SCEPtER maximum-likelihood algorithm. We finally analysed the differences between the true and recovered mixing-length values quantifying the random errors due to the observational uncertainties and the biases due to possible discrepancies in the chemical composition and input physics between artificial stars and the models adopted in the recovery.Results. We verified that theαmlestimates are affected by a huge spread, even in the ideal configuration of perfect agreement between the mock data and the recovery grid of models. While the artificial stars were computed at fixed solar-calibratedαml = 2.10, the recovered values had a mean of 2.20 and a standard deviation of 0.52. Then we explored the case in which the solar heavy-element mixture used to compute the models is different from that adopted in the artificial stars. We found an estimated mixing-length mean of 2.24 ± 0.48 and, more interestingly, a metallicity relationship in whichαmlincreases by 0.4 for an increase of 1 dex in [Fe/H]. Thus, a simple heavy-element mixture mismatch induced a spurious, but statistically robust, dependence of the estimated mixing-length on metallicity. The origin of this trend was further investigated considering the differences in the initial helium abundanceY– [Fe/H] – initial metallicityZrelation assumed in the models and data. We found that a discrepancy between the adopted helium-to-metal enrichment ratio ΔY/ΔZcaused the appearance of spurious trends in the estimated mixing-length values. An underestimation of its value from ΔY/ΔZ = 2.0 in the mock data to ΔY/ΔZ = 1.0 in the recovery grid resulted in an increasing trend, while the opposite behaviour occurred for an equivalent overestimation. A similar effect was caused by an offset in the [Fe/H] to global metallicityZconversion. A systematic overestimation of [Fe/H] by 0.1 dex in the recovery grid of models forced an increasing trend ofαmlversus [Fe/H] of about 0.2 per dex. We also explored the impact of some possible discrepancies between the adopted input physics in the recovery grid of models and mock data. We observed an induced trend with metallicity of about Δαml = 0.3 per dex when the effect of the microscopic diffusion is neglected in the recovery grid, while no trends originated from a wrong assumption on the effective temperature scale by ±100 K. Finally, we proved that the impact of different assumptions on the outer boundary conditions was apparent only in the RGB phase.Conclusions. We showed that the mixing-length estimates of field stars are affected by a huge spread even in an ideal case in which the stellar models used to estimateαmlare exactly the same models as used to build the mock dataset. Moreover, we proved that there are many assumptions adopted in the stellar models used in the calibration that can induce spurious trend of the estimatedαmlwith [Fe/H]. Therefore, any attempt to calibrate the mixing-length parameter by means ofTeff, [Fe/H], Δν, andνmaxof field stars seems to be statistically poorly reliable. As such, any claim about the possible dependence of the mixing-length on the metallicity for field stars should be considered cautiously and critically.