Abstract

Wireless communication standards such as Long-term Evolution (LTE) are rapidly changing to support the high data-rate of wireless devices. The physical layer baseband processing has strict real-time deadlines, especially in the next-generation applications enabled by the 5G standard. Existing basestation transceivers utilize customized DSP cores or fixed-function hardware accelerators for physical layer baseband processing. However, these approaches incur significant non-recurring engineering costs and are inflexible to newer standards or updates. Software-programmable processors offer more adaptability. However, it is challenging to sustain guaranteed worst-case latency and throughput at reasonably low-power on shared-memory many-core architectures featuring inherently unpredictable design choices, such as caches and Network-on-chip (NoC). We propose SPECTRUM , a predictable, software-defined many-core architecture that exploits the massive parallelism of the LTE/5G baseband processing workload. The focus is on designing scalable lightweight hardware that can be programmed and defined by sophisticated software mechanisms. SPECTRUM employs hundreds of lightweight in-order cores augmented with custom instructions that provide predictable timing, a purely software-scheduled NoC that orchestrates the communication to avoid any contention, and per-core software-controlled scratchpad memory with deterministic access latency. Compared to many-core architecture like Skylake-SP (average power 215 W) that drops 14% packets at high-traffic load, 256-core SPECTRUM by definition has zero packet drop rate at significantly lower average power of 24 W. SPECTRUM consumes 2.11× lower power than C66x DSP cores+accelerator platform in baseband processing. We also enable SPECTRUM to handle dynamic workloads with multiple service categories present in 5G mobile network (Enhanced Mobile Broadband (eMBB), Ultra-reliable and Low-latency Communications (URLLC), and Massive Machine Type Communications (mMTC)), using a run-time scheduling and mapping algorithm. Experimental evaluations show that our algorithm performs task/NoC mapping at run-time on fewer cores compared to the static mapping (that reserves cores exclusively for each service category) while still meeting the differentiated latency and reliability requirements.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call