Deep Neural Networks (DNNs) are widely used in Cyber–Physical Systems (CPS) that often involve multiple DNN tasks with varying real-time requirements. These tasks need to be deployed on a single embedded hardware platform with limited resources, such as an embedded GPU. Efficiently sharing the same embedded GPU among multiple real-time DNN tasks is a complex challenge. While existing DNN frameworks (e.g., PyTorch and TensorFlow) focus on maximizing average performance and high throughput on GPU, they lack scheduling management mechanisms considering multiple DNNs with different timing requirements. In this paper, we address this challenge by thoroughly examining and summarizing the scheduling rules for multiple kernels with different priorities in CUDA streams. Based on these rules, we design a framework that supports multi-DNN real-time inference and propose a method for allocating CUDA streams to DNN kernels to meet schedulability requirements while maximizing GPU resource utilization. Our proposed approach is implemented on an NVIDIA Jetson AGX Xavier embedded GPU system and validated using several popular DNNs. The results show that our approach achieves shorter response times compared with several state-of-the-art methods.