Abstract
Parallel scheduling of multiple real-time applications onto heterogeneous processors is needed in the emerging embedded systems such as self-driving cars, smart cameras, and smartphones. Assuming that an embedded application is specified as a synchronous dataflow (SDF) graph or its extension, we propose a novel parallel scheduling methodology based on an evolutionary algorithm where the mapping of tasks onto processors is evolved to optimize a given objective function in an iterative fashion. In each iteration, we use an existing worst-case response time (WCRT) analysis tool to check if all applications satisfy their real-time requirements by translating each SDF graph into a directed acyclic graph (DAG) that is assumed in the WCRT analysis tool. Since the WCRT analysis must be performed in each iteration of evolution, we propose a clustering technique to reduce drastically the analysis time that depends on the number of nodes and their dependency. We formally prove that the proposed clustering technique does not change the estimated WCRT of each application. The effectiveness of the proposed scheduling methodology with the clustering technique is verified with extensive experiments using real-life benchmarks, randomly generated graphs, and the comparison with the existing technique.
Highlights
To cope with the increasing user demand for compute-intensive deep learning applications, embedded systems tend to equip heterogeneous processing elements (PEs) that include a multi-core CPU, a GPU, and/or a deep learning accelerator called a Neural Processing Unit (NPU)
We assume that an embedded application is specified as a synchronous dataflow (SDF) [2] graph or its extension
Since the number of data samples consumed from each input port or produced to each output port per task execution is fixed in the SDF model, we can construct an execution schedule of tasks statically
Summary
To cope with the increasing user demand for compute-intensive deep learning applications, embedded systems tend to equip heterogeneous processing elements (PEs) that include a multi-core CPU, a GPU, and/or a deep learning accelerator called a Neural Processing Unit (NPU). The key constraint for node clustering is not to change the real-time performance by considering all possible interference scenarios between applications for given mapping and scheduling information of applications This constraint makes the proposed clustering technique distinguished from existent SDF clustering techniques ( [18], [19]) that do not consider mapping and scheduling. A novel parallel scheduling technique based on an evolutionary algorithm is proposed to schedule multiple SDF graphs with diverse real-time characteristics onto heterogeneous PEs. For the performance evaluation of each mapping candidate, it uses an existing WCRT analysis tool. We formally prove that the proposed clustering technique does not change the real-time performance that is estimated by the WCRT analysis tool.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.