Most human actions produce concomitant sounds. Action sounds can be either part of the action goal (GAS, goal-related action sounds), as for instance in tap dancing, or a mere by-product of the action (BAS, by-product action sounds), as for instance in hurdling. It is currently unclear whether these two types of action sounds—incidental or intentional—differ in their neural representation and whether the impact on the performance evaluation of an action diverges between the two. We here examined whether during the observation of tap dancing compared to hurdling, auditory information is a more important factor for positive action quality ratings. Moreover, we tested whether observation of tap dancing vs. hurdling led to stronger attenuation in primary auditory cortex, and a stronger mismatch signal when sounds do not match our expectations. We recorded individual point-light videos of newly trained participants performing tap dancing and hurdling. In the subsequent functional magnetic resonance imaging (fMRI) session, participants were presented with the videos that displayed their own actions, including corresponding action sounds, and were asked to rate the quality of their performance. Videos were either in their original form or scrambled regarding the visual modality, the auditory modality, or both. As hypothesized, behavioral results showed significantly lower rating scores in the GAS condition compared to the BAS condition when the auditory modality was scrambled. Functional MRI contrasts between BAS and GAS actions revealed higher activation of primary auditory cortex in the BAS condition, speaking in favor of stronger attenuation in GAS, as well as stronger activation of posterior superior temporal gyri and the supplementary motor area in GAS. Results suggest that the processing of self-generated action sounds depends on whether we have the intention to produce a sound with our action or not, and action sounds may be more prone to be used as sensory feedback when they are part of the explicit action goal. Our findings contribute to a better understanding of the function of action sounds for learning and controlling sound-producing actions.