Abstract Users of signed and spoken languages regularly engage bodily enactment (commonly referred to as constructed action [CA] for signers and character viewpoint gestures [CVPT] for speakers) for the creation of meaning, but comparatively few studies have addressed how linguistic grammar interfaces with such gestural depictive devices across language modalities. CVPT gestures have been shown to co-occur with spoken language transitive verbs, and when a reference is definite or more accessible in the discourse. In sign, CA often alternates sequentially with fully conventionalized signs. In both CVPT and CA demonstrations, syntactic and pragmatic factors appear to be important. In this work, we consider these patterns by examining short retellings of video-based elicitation stimuli (silent-movie segments) from 10 deaf users of ASL (American Sign Language) and 20 hearing speakers of English. We describe examples of signs and words that co-occur with or precede specific instances of CA and CVPT. We also examine distributions and degrees of enactment across participants in order to consider the question of gesture threshold (Hostetter and Alibali, 2008, 2019). We provide various examples of how gestural material interfaces with linguistic grammar, which has implications for syntactic theory and possible grammatical constraints on such communicative devices.