SPECTRUM

Vanchinathan Venkataramani,Aditi Kulkarni,Tulika Mitra,Li-Shiuan Peh

doi:10.1145/3400032

Abstract

Wireless communication standards such as Long-term Evolution (LTE) are rapidly changing to support the high data-rate of wireless devices. The physical layer baseband processing has strict real-time deadlines, especially in the next-generation applications enabled by the 5G standard. Existing basestation transceivers utilize customized DSP cores or fixed-function hardware accelerators for physical layer baseband processing. However, these approaches incur significant non-recurring engineering costs and are inflexible to newer standards or updates. Software-programmable processors offer more adaptability. However, it is challenging to sustain guaranteed worst-case latency and throughput at reasonably low-power on shared-memory many-core architectures featuring inherently unpredictable design choices, such as caches and Network-on-chip (NoC). We propose SPECTRUM , a predictable, software-defined many-core architecture that exploits the massive parallelism of the LTE/5G baseband processing workload. The focus is on designing scalable lightweight hardware that can be programmed and defined by sophisticated software mechanisms. SPECTRUM employs hundreds of lightweight in-order cores augmented with custom instructions that provide predictable timing, a purely software-scheduled NoC that orchestrates the communication to avoid any contention, and per-core software-controlled scratchpad memory with deterministic access latency. Compared to many-core architecture like Skylake-SP (average power 215 W) that drops 14% packets at high-traffic load, 256-core SPECTRUM by definition has zero packet drop rate at significantly lower average power of 24 W. SPECTRUM consumes 2.11× lower power than C66x DSP cores+accelerator platform in baseband processing. We also enable SPECTRUM to handle dynamic workloads with multiple service categories present in 5G mobile network (Enhanced Mobile Broadband (eMBB), Ultra-reliable and Low-latency Communications (URLLC), and Massive Machine Type Communications (mMTC)), using a run-time scheduling and mapping algorithm. Experimental evaluations show that our algorithm performs task/NoC mapping at run-time on fewer cores compared to the static mapping (that reserves cores exclusively for each service category) while still meeting the differentiated latency and reliability requirements.

Full Text