Upon hearing objects collide, humans can estimate many of the underlying physical attributes, such as the objects’ material and mass. Although the physics of sound generation are well established, the inverse problem that listeners must solve – of inferring physical parameters from sound – remains poorly understood. In this work, we show that humans leverage an understanding of acoustical physics to constrain their perceptual inferences, allowing them to disambiguate multiple object properties from a single impact sound. We derived a linear generative model of impact sounds, combining theoretical acoustics with empirically measured statistics of object resonances. We used an analysis-by-synthesis algorithm to infer mode parameters from recorded object impulse responses. We then fit distributions to these parameters, from which object impulse responses could be sampled. Perceptual experiments demonstrated that humans could judge material and mass from sound alone, even when both of the underlying objects varied. However, performance with synthetic sounds was impaired if the simulated physical regularities were altered to be unnatural. The results suggest that listeners use internal physical models to separate the acoustic contributions of the objects that interact to create sound.