Both words and gestures have been shown to influence object categorization, often even overriding perceptual similarities to cue category membership. However, gestures are often meaningful to infants while words are arbitrarily related to an object they refer to, more similar to arbitrary actions that can be performed on objects. In this study, we examine how words and arbitrary actions shape category formation. Across three conditions (word cue, action cue, word-action cue), we presented infants (N = 90) with eight videos of single-category objects which vary in colour and other perceptual features. The objects were either accompanied by a word and/or an action that is being performed on the object. Infants in the word and action condition showed a decrease in looking over the course of the familiarization phase indicating habituation to the category, but infants in the word-action condition did not. At test, infants saw a novel object of the just-learned category and a novel object from another category side-by-side on the screen. There was some evidence for an advantage for words in shaping early object categorization, although we note that this was not robust across analyses.