Parental scaffolding such as looking at and showing objects has long been considered to be helpful for early attention and language development. However, relatively little is known about how parental social multimodal cues work alone or together in guiding an infant's attention toward the referent items. The present study aims to document the dynamics of social referential input during an interactive play session and specify the different types of social cues in directing infant attention. Forty-three parent-infant dyads (infants aged from 5.0 to 18.0months) in the U.S. completed a short play session recorded by head-mounted camera with eye-trackers. The present findings suggest that joint attention between parent and infant toward the same referent item often co-occurred with other referential input. Infants were more likely to maintain sustained attention to an object under the circumstance that the parent looked at the same item and named it explicitly. This was not the case when parent object looking accompanied other utterances, like "Look!" or the child's name. The present study highlights the importance of multimodal referential input, which sets up enriched opportunities for children to become sensitive to social input and develop sustained attention for further learning.