Listening to sounds or music is not a homogeneous act of grasping meanings by hearing. Yet it is often portrayed as such, especially when the intentional stance of a listener is overlooked. This paper distinguishes listening as the action-oriented intentional activity of making sense of the world. It is proposed that the multifaceted and heterogeneous nature of ‘understanding by listening’ can be outlined in terms of distinct modes of listening. Building upon previous accounts, a revised taxonomy of nine listening modes (reflexive, kinaesthetic, connotative, causal, empathetic, functional, semantic, reduced and critical listening) is proposed and illustrated by examples. Modes refer to different constituents of meaning-creation in the process of listening. In the taxonomy, they are schematically arranged into three levels (experiential, denotative and reflective). The theoretical framework of this revised taxonomy utilizes an embodied cognition paradigm. The experiential basis of meaning in listening is theoretically conceived of as emerging resonances between experiential patterns of sensations, structured patterns of recurrent sensorimotor experiences (action–sound couplings) and the projection of action-relevant mental images. The proposed taxonomy of listening modes is discussed in terms of its implications for perception and cognition research on sounds and music.