The development of large-scale corpora has led to a quantum leap in our understanding of speech in recent years. By contrast, the analysis of massive datasets has so far had a limited impact on the study of gesture and other visual communicative behaviors. We utilized the UCLA-Red Hen Lab multi-billion-word repository of video recordings, all of them showing communicative behavior that was not elicited in a lab, to quantify speech-gesture co-occurrence frequency for a subset of linguistic expressions in American English. First, we objectively establish a systematic relationship in the high degree of co-occurrence between gesture and speech in our subset of expressions, which consists of temporal phrases. Second, we show that there is a systematic alignment between the informativity of co-speech gestures and that of the verbal expressions with which they co-occur. By exposing deep, systematic relations between the modalities of gesture and speech, our results pave the way for the data-driven integration of multimodal behavior into our understanding of human communication.