Modulated maskers produce less amount of masking than unmodulated maskers, an effect referred to as masking release (MR). Both listening in the temporal dips and fast cochlear compression have been suggested as underlying mechanisms. We addressed the role of dip listening by measuring temporal integration in simultaneous masking using Schroeder-phase harmonic complexes (SPHC) with various phase curvatures. In an experiment with six normal-hearing listeners, SPHC masker and pure-tone target stimuli were covaried in duration at a high masker level. The MR increased with stimulus duration, suggesting integration of target information across multiple masker dips. The duration dependence of the MR was predicted by a physiology-inspired model based on the temporal envelope modulation strength in the auditory periphery. The modeling analysis suggested that listeners detect the presence of the target by a reduction in fluctuation strength that results primarily from a decline of F0-based response peaks, an effect known as synchrony capture. The detailed pattern of masked thresholds across various masker phase curvatures was not predicted by the model, suggesting that its phase response does not well fit the human phase response. Overall, temporal integration across neural envelope features associated with the masker dips seems to contribute to the MR with SPHCs.