Abstract

Parallel scheduling of multiple real-time applications onto heterogeneous processors is needed in the emerging embedded systems such as self-driving cars, smart cameras, and smartphones. Assuming that an embedded application is specified as a synchronous dataflow (SDF) graph or its extension, we propose a novel parallel scheduling methodology based on an evolutionary algorithm where the mapping of tasks onto processors is evolved to optimize a given objective function in an iterative fashion. In each iteration, we use an existing worst-case response time (WCRT) analysis tool to check if all applications satisfy their real-time requirements by translating each SDF graph into a directed acyclic graph (DAG) that is assumed in the WCRT analysis tool. Since the WCRT analysis must be performed in each iteration of evolution, we propose a clustering technique to reduce drastically the analysis time that depends on the number of nodes and their dependency. We formally prove that the proposed clustering technique does not change the estimated WCRT of each application. The effectiveness of the proposed scheduling methodology with the clustering technique is verified with extensive experiments using real-life benchmarks, randomly generated graphs, and the comparison with the existing technique.

Highlights

  • To cope with the increasing user demand for compute-intensive deep learning applications, embedded systems tend to equip heterogeneous processing elements (PEs) that include a multi-core CPU, a GPU, and/or a deep learning accelerator called a Neural Processing Unit (NPU)

  • We assume that an embedded application is specified as a synchronous dataflow (SDF) [2] graph or its extension

  • Since the number of data samples consumed from each input port or produced to each output port per task execution is fixed in the SDF model, we can construct an execution schedule of tasks statically

Read more

Summary

INTRODUCTION

To cope with the increasing user demand for compute-intensive deep learning applications, embedded systems tend to equip heterogeneous processing elements (PEs) that include a multi-core CPU, a GPU, and/or a deep learning accelerator called a Neural Processing Unit (NPU). The key constraint for node clustering is not to change the real-time performance by considering all possible interference scenarios between applications for given mapping and scheduling information of applications This constraint makes the proposed clustering technique distinguished from existent SDF clustering techniques ( [18], [19]) that do not consider mapping and scheduling. A novel parallel scheduling technique based on an evolutionary algorithm is proposed to schedule multiple SDF graphs with diverse real-time characteristics onto heterogeneous PEs. For the performance evaluation of each mapping candidate, it uses an existing WCRT analysis tool. We formally prove that the proposed clustering technique does not change the real-time performance that is estimated by the WCRT analysis tool.

RELATED WORK
PARALLEL SCHEDULING METHODOLOGY
NODE CLUSTERING TECHNIQUE
SUPPORTING NON-PREEMPTIVE PROCESSING ELEMENTS
TIME COMPLEXITY
DEPENDENCY RELAXATION OPTIMIZATION
EXPERIMENT
Findings
VIII. CONCLUSION AND FUTURE WORK
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call