Context. The field of galaxy evolution will make a great leap forward in the next decade as a consequence of the huge effort by the scientific community in multi-object spectroscopic facilities. Various future surveys will enormously increase the number of available galaxy spectra, providing new insights into unexplored areas of research. To maximise the impact of such incoming data, the analysis methods must also step up, extracting reliable information from the available spectra. It is therefore urgent to refine and test reliable analysis tools that are able to infer the properties of a galaxy from medium- or high-resolution spectra. Aims. In this paper we aim to investigate the limits and the reliability of different spectral synthesis methods in the estimation of the mean stellar age and metallicity. These two quantities are fundamental to determine the assembly history of a galaxy by providing key insights into its star formation history. The main question this work aims to address is which signal-to-noise ratios (S/N) are needed to reliably determine the mean stellar age and metallicity from a galaxy spectrum and how this depends on the tool used to model the spectra. Methods. To address this question we built a set of realistic simulated spectra containing stellar and nebular emission, reproducing the evolution of a galaxy in two limiting cases: a constant star formation rate and an exponentially declining star formation with a single initial burst. We degraded the synthetic spectra built from these two star formation histories (SFHs) to different S/N and analysed with three widely used spectral synthesis codes, namely FADO, STECKMAP, and STARLIGHT, assuming similar fitting set-ups and the same base of spectral templates. Results. For S/N ≤ 5 all the three tools show a large diversity in the results. The FADO and STARLIGHT tools find median differences in the light-weighted mean stellar age of ∼0.1 dex, while STECKMAP shows a higher value of ∼0.2 dex. For S/N > 50 the median differences in FADO are ∼0.03 dex (∼7%), a factor 3 and 4 lower than the 0.08 dex (∼20%) and 0.11 dex (∼30%) obtained from STARLIGHT and STECKMAP, respectively. Detailed investigations of the best-fit spectrum for galaxies with overestimated mass-weighted quantities point towards the inability of purely stellar models to fit the observed spectral energy distribution around the Balmer jump. Conclusions. Our results imply that when a galaxy enters a phase of high specific star formation rate (sSFR) the neglect of the nebular continuum emission in the fitting process has a strong impact on the estimation of its SFH when purely stellar fitting codes are used, even in presence of high S/N spectra. The median value of these differences are of the order of 7% (FADO), 20% (STARLIGHT), and 30% (STECKMAP) for light-weighted quantities, and 20% (FADO), 60% (STARLIGHT), and 20% (STECKMAP) for mass-weighted quantities. More specifically, for a continuous SFH both STECKMAP and STARLIGHT overestimate the stellar age by > 2 dex within the first ∼100 Myr even for high S/N spectra. This bias, which stems from the neglect of nebular continuum emission, obviously implies a severe overestimation of the mass-to-light ratio and stellar mass. But even in the presence of a mild contribution from nebular continuum, there is still the possibility to misinterpret the data as a consequence of the poor quality of the observations. Our work underlines once more the importance of a self-consistent treatment of nebular emission, as implemented in FADO, which, according to our analysis, is the only viable route towards a reliable determination of the assembly of any high-sSFR galaxy at high and low redshift.