Emerging multi-hop wireless networks provide a low-cost and flexible infrastructure that can be simultaneously utilized by multiple users for a variety of applications, including delay-sensitive multimedia transmission. However, this wireless infrastructure is often unreliable and provides dynamically varying resources with only limited quality of service (QoS) support for multimedia applications. To cope with the time-varying QoS, existing algorithms often rely on non-scalable, flow-based optimizations to allocate the available network resources (paths and transmission opportunities) across the various multimedia users. Moreover, previous research seldom optimizes jointly the dynamic routing with the adaptation and protection techniques available at the medium access control (MAC) or physical (PHY) layers. In this paper, we propose a distributed packet-based cross-layer algorithm to maximize the decoded video quality of multiple users engaged in simultaneous real-time streaming sessions over the same multi-hop wireless network. Our algorithm explicitly considers packet-based distortion impact and delay constraints in assigning priorities to the various packets and then relies on priority queuing to drive the optimization of the various users' transmission strategies across the protocol layers as well as across the multi-hop network. The proposed solution is enabled by the scalable coding of the video content (i.e. users can transmit and consume video at different quality levels) and the cross-layer optimization strategies, which allow priority-based adaptation to varying channel conditions and available resources. The cross-layer strategies - application layer packet scheduling, the policy for choosing the relays, the MAC retransmission strategies, the PHY modulation and coding schemes - are optimized per packet, at each node, in a distributed manner. The main component of the proposed solution is a low-complexity, distributed, and dynamic routing algorithm, which relies on prioritized queuing to select the path and time reservation for the various packets, while explicitly considering instantaneous channel conditions, queuing delays and the resulting interference. Our results demonstrate the merits and need for end-to-end cross-layer optimization in order to provide an efficient solution for real-time video transmission using existing protocols and infrastructures. Importantly, our proposed delay-driven, packet-based transmission is superior in terms of both network scalability and video quality performance to previous flow-based solutions that statically allocate resources based on predetermined paths and rate requirements. In addition, the results provide important insights that can guide the design of network infrastructures and streaming protocols for video streaming.