As a hypothesis on the origins of mind and language, the evolutionary theory of the sensorimotor paradox suggests that capacities for imagination, self-representation and abstraction would operate from a dissociation in what is known as the forward model. In some studies, sensory perception is understood as a system of prediction and confirmation (feedforward and feedback processes) that would share common yet distinct and overlapping neural networks with mental imagery. The latter would then mostly operate through internal feedback processes. The hypothesis of our theory is that dissociation and parallelism between those processes would make it less likely for imaginary prediction to match and simultaneously coincide with any sensory feedback, contradicting the stimulus/response pattern. The gap between the two and the effort required to maintain this gap, born from the development of bipedal stance and a radical change to our relation to our own hands, would be the very structural foundation to our capacity to elaborate abstract thoughts, by partially blocking and inhibiting motor action. Mental imagery would structurally be dissociated from perception, though maintaining an intricated relation of interdependence. Moreover, the content of the images would be subordinate to their function as emotional regulators, prioritising consistency with some global, conditional and socially learnt body-image. As a higher-level and proto-aesthetic function, we can speculate that the action and instrumentalisation of dissociating imagination from perception would become the actual prediction and their coordination, the expected feedback.