One of the main benefits of large interactive surfaces (e.g. wall-sized displays) lies in their support for collocated collaboration by facilitating simultaneous interactions with the displays and high awareness of other group members' actions. In the context of remote collaboration, this awareness information needs to be acquired through digital means such as video feeds, which typically offer very limited information on non-verbal communication aspects, including on workspace awareness. We describe a new approach we have implemented to tackle that challenge through a multimodal pipeline that deals with tracking, attributing, transmitting, and visualising non-verbal information through what we refer to as workspace awareness cues, across wall-sized displays placed at distant locations. Our approach relies on commodity depth cameras combined with screen configuration information to generate deictic cues such as pointing targets and gaze direction. It also leverages recent artificial intelligence breakthroughs to attribute such cues to identified individuals and augment them with additional gestural interactions. In the present paper, we expand on the details and rationale behind our approach, describe its technical implementation, validate its novelty with regards to the existing literature, and report on early but promising results from an evaluation we conducted based on a mixed-presence decision-making scenario across two distant wall-sized displays.