We construct a categorical framework of gestures, generalizing the topological approach from [G. Mazzola and A. Moreno, Diagrams, gestures and formulae in music, J. Math. Music 1 (2007), pp. 23–46], and culminating in the construction of a gesture bicategory, which enriches the classical Yoneda embedding. This framework could be a valid candidate for the conjectured space X in the diamond conjecture [G. Mazzola and A. Moreno, Diagrams, gestures and formulae in music, J. Math. Music 1 (2007), pp. 23–46]. We discuss first applications thereof for topological groups, and then more concretely gestures in modulation processes in Beethoven's Hammerklavier Sonata. The latter offers a first concretization of answers to Lewin's big question from [D. Lewin, Generalized Musical Intervals and Transformations, Yale University Press, New Haven, CT, 1987] concerning characteristic gestures. Yoneda's philosophy, as traced in his famous lemma, succeeds in reinterpreting objects and morphisms in abstract categories by their intuitive set-valued functors and natural transformations, i.e. we are provided with the (Fregean) functions between (variable) sets rather than with completely encapsulated objects and arrows. This is the ‘objective’ half of Yoneda's insight. This research does not solve the second, ‘morphic’, half of Yoneda's philosophy, namely a replacement of Fregean functional abstraction by gestural dynamics, but its technical toolbox provides us with first steps towards a gestural Yoneda philosophy.