The effect of irrelevant sounds on short-term memory was investigated in two experiments using noise-vocoded speech stimuli (NVSS). Speech samples were systematically modified by a noise-vocoder and a set of stimuli varying from amplitude-modulated white noise to intelligible speech was created. Eight NVSS conditions, composed of 1-, 2-, 4-, 6-, 9-, 12-, 15-, and 18-bands, were used as the distracting stimuli in a digit-recall task next to the speech and silence conditions. The results showed that performance decreased with the number of frequency bands up to the 6-bands condition, but there was no influence of number of bands on performance beyond six bands. The results were analyzed using four acoustic metrics proposed in the literature: the frequency domain correlation coefficient (FDCC), the fluctuation strength, the speech transmission index (STI), and the normalized covariance measure (NCM). None of the metrics successfully predicted the results. However, the parameter values of the FDCC, the STI, and the NCM indicated that a prediction model for irrelevant sound effect should account for both temporal and spectral features of the irrelevant sounds.