Abstract

The rapid evolution of Cloud-based services and the growing interest in deep learning (DL)-based applications is putting increasing pressure on hyperscalers and general purpose hardware designers to provide more efficient and scalable systems. Cloud-based infrastructures must consist of more energy efficient components. The evolution must take place from the core of the infrastructure (i.e., data centers (DCs)) to the edges (Edge computing) to adequately support new/future applications. Adaptability/elasticity is one of the features required to increase the performance-to-power ratios. Hardware-based mechanisms have been proposed to support system reconfiguration mostly at the processing elements level, while fewer studies have been carried out regarding scalable, modular interconnected sub-systems. In this paper, we propose a scalable Software Defined Network-on-Chip (SDNoC)-based architecture. Our solution can easily be adapted to support devices ranging from low-power computing nodes placed at the edge of the Cloud to high-performance many-core processors in the Cloud DCs, by leveraging on a modular design approach. The proposed design merges the benefits of hierarchical network-on-chip (NoC) topologies (via fusing the ring and the 2D-mesh topology), with those brought by dynamic reconfiguration (i.e., adaptation). Our proposed interconnect allows for creating different types of virtualised topologies aiming at serving different communication requirements and thus providing better resource partitioning (virtual tiles) for concurrent tasks. To further allow the software layer controlling and monitoring of the NoC subsystem, a few customised instructions supporting a data-driven program execution model (PXM) are added to the processing element’s instruction set architecture (ISA). In general, the data-driven programming and execution models are suitable for supporting the DL applications. We also introduce a mechanism to map a high-level programming language embedding concurrent execution models into the basic functionalities offered by our SDNoC for easing the programming of the proposed system. In the reported experiments, we compared our lightweight reconfigurable architecture to a conventional flattened 2D-mesh interconnection subsystem. Results show that our design provides an increment of the data traffic throughput of % and a reduction of of the average packet latency, compared to the flattened 2D-mesh topology connecting the same number of processing elements (PEs) (up to 1024 cores). Similarly, power and resource (on FPGA devices) consumption is also low, confirming good scalability of the proposed architecture.

Highlights

  • Cloud-based execution environments are in place to process the complex machine learning (ML)algorithms

  • To reduce the area and energy costs associated with the implementation of this look-up table (LUT), we found that providing up to 256 processing elements (PEs) in a single VT is enough for supporting ML/deep learning (DL) algorithm mapping well

  • Unlike the von Neumann execution model, data-driven models require a private block of memory to store inputs used by the threads to run, a counter storing the number of inputs still not received, and the pointer of the thread body

Read more

Summary

Introduction

Cloud-based execution environments are in place to process the complex machine learning (ML). There is a flurry of research for designing more efficient DL-based algorithms and custom hardware accelerators to execute them better (such as the Xilinx reconfigurable acceleration stack). Most of these accelerators are spatial (i.e., an array of interconnected PEs), with input data elaborated following a data-driven approach. Internal hardware counters are read via the dedicated instructions This information can be exploited by optimisation tools and compilers to better adapt to the communication patterns of an application. Productivity is improved by introducing data-driven PXM support into a high-level programming language, allowing the application developer to readily exploit the benefit of interconnection reconfigurability and data-driven execution.

System Overview
Challenges and State-of-the-Art
Paper Contribution
Network-on-Chip Architecture
Router Micro-Architecture
Data Packet Structure and Control Flow
Ring Switch Micro-Architecture
NoC Adaptability
Software Interface
High-Level Programming Interface
Data-Driven PXM
Mapping Goroutines on DD-Threads
Linking NoC Software Interface
Evaluation
Network Performance
Area Cost and Power Consumption
Conclusions and Future Work

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.