Sayram: A Hardware-software Co-design to Accelerate Wireless Baseband Processing

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Micro base stations, with limited antennas and extensive deployment, require scaled-down hardware. Software-defined radio solutions (e.g., CPU, many-core systems, GPU) offer flexibility but incur high area and power costs, while traditional DSP lacks efficient acceleration for smaller configurations. The key challenge for micro base stations is achieving minimal area and power overhead while meeting 5G requirements. This paper presents a hardware-software co-designed architecture, Sayram, which minimizes overhead for 5G physical layer processing. Sayram integrates an instruction fusion mechanism, along with the compiler for simplified programming, a Vector Indirect Addressing Memory (VIAM) to minimize memory access cycles, and an improved vector register design to accelerate small-scale matrix computation, thereby improving overall processor efficiency. Operating at 1 GHz, Sayram achieves 158GOPS with a 1.18 mm² area, supporting 2T2R and 4T4R Physical Uplink Shared Channel (PUSCH) processing in single-core and dual-core modes, respectively. Evaluations show that Sayram’s area efficiency is 3 × and 9 × higher than traditional DSP and CGRA architectures, respectively, with power efficiency improvements of 44 × and 6 ×. Sayram’s energy and area efficiency surpass CPU solutions by orders of magnitude.

Similar Papers
  • Conference Article
  • Cite Count Icon 17
  • 10.1109/vtc2021-spring51267.2021.9449057
Two-Step Random Access in 5G New Radio: Channel Structure Design and Performance
  • Apr 1, 2021
  • Elena Peralta + 3 more

A common design of the random access procedure on the physical random access channel (PRACH) is required for the diverse usage scenarios in the fifth generation new radio (5G NR) mobile networks. Based on the latest 3GPP specifications and evaluation assumptions agreed for Release 16, the 2 step-RACH (2SR) enhancement, composed of the denoted MsgA and MsgB, not only reduces the latency but also the control-signalling overhead due to the reduced number of messages transmitted. The channel structure of MsgA comprises RACH preamble and data in the physical uplink shared channel (PUSCH) while MsgB combines the random access response and the contention resolution. This procedure should operate in local area (LA), medium range (MR) and wide area (WA) cells despite the lack of time alignment (TA) in the PUSCH part of MsgA. The demodulation performance degradation observed without time offset compensation at the base station (gNB), specially for MR or WA cells, highlight that practical gNB implementations relying in MAC control element-based TA command for PUSCH time alignment are not conceivable for 2SR. Furthermore, in the case that all preambles from multiple users (UEs) trying to perform the initial access are mapped to the same PUSCH physical resources, the associated data parts overlap and may result in unsuccessful decoding. There is therefore a trade-off between the collision probability of the PUSCH part of MsgA and the resource overhead for 2SR. This paper addresses the channel structure design of this procedure for the preamble and data parts of MsgA together with the receiver processing framework. The performance results suggest that using lower payload sizes provide higher resource utilization and allow more UEs to be multiplexed within the same PUSCH occasion. In addition, using different DMRS ports for UEs sharing same physical resources decrease the probability of failure in the decoding of the data part of MsgA while reduces the resource overhead for 2SR.

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/gpecom49333.2020.9247906
Throughput Analysis over 5G NR Physical Uplink Shared Channels
  • Oct 20, 2020
  • Yasin Kabalci + 1 more

The high throughput with low latency, massively connected devices, and effective utilization of spectrum for current wireless communication systems can be realized by adopting the fifth-generation (5G) new radio (NR) air interface. The key remarkable features that 5G NR presents are Ultra-Reliable Low-Latency Communications (URLLC), enhanced Mobile Broadband (eMBB), and massive Machine Type Communications (mMTC). To meet thereof features, 5G NR exerts different multiple access and modulation techniques. This paper addresses the physical layer of 5G NR and more explicitly explores the transmission of 5G NR over the physical uplink shared channel (PUSCH) considering several parameters. For example, different sub-carrier spacings (SCSs) are taken into account for analyzing the performance of PUSCH in terms of throughput versus Signal-to-Noise Ratio (SNR). Moreover, the effect of the well-known modulation techniques such as Quadrature Phase Shift Keying (QPSK), different order of Quadrature Amplitude Modulation (QAM) (i.e., 16, 64, and 256) on throughput is studied. Later on, the number of base station (BS) and user equipment (UE) antennas are varied. Lastly, the performance of PUSCH over different propagation channel models (clustered delay line (CDL) and tap delay line (TDL)) is also investigated in this paper. The extensive simulation studies have proved that QPSK exhibits finer outcomes in low SNR regions while 256-QAM shows remarkable results in high SNR regions. The maximum throughput can be realized even in low SNR regime if the number of BS antennas is increased. In addition, high throughput value can be attained by increasing SCS.

  • Conference Article
  • Cite Count Icon 4
  • 10.23919/date56975.2023.10137247
Efficient Parallelization of 5G-PUSCH on a Scalable RISC-V Many-Core Processor
  • Apr 1, 2023
  • Marco Bertuletti + 3 more

5G Radio access network disaggregation and soft-warization pose challenges in terms of computational performance to the processing units. At the physical layer level, the baseband processing computational effort is typically offloaded to specialized hardware accelerators. However, the trend toward software-defined radio-access networks demands flexible, programmable architectures. In this paper, we explore the software design, parallelization and optimization of the key kernels of the lower physical layer (PHY) for physical uplink shared channel (PUSCH) reception on MemPool and TeraPool, two manycore systems having respectively 256 and 1024 small and efficient RISC-V cores with a large shared L1 data memory. PUSCH processing is demanding and strictly time-constrained, it represents a challenge for the baseband processors, and it is also common to most of the uplink channels. Our analysis thus generalizes to the entire lower PHY of the uplink receiver at gNodeB (gNB). Based on the evaluation of the computational effort (in multiply-accumulate operations) required by the PUSCH algorithmic stages, we focus on the parallel implementation of the dominant kernels, namely fast Fourier transform, matrix-matrix multiplication, and matrix decomposition kernels for the solution of linear systems. Our optimized parallel kernels achieve respectively on MemPool and TeraPool speedups of 211, 225, 158, and 762, 880, 722, at high utilization (0.81, 0.89, 0.71, and 0.74, 0.88, 0.71), comparable a single-core serial execution, moving a step closer toward a full-software PUSCH implementation.

  • Conference Article
  • 10.1109/tsp.2015.7296343
Analysis and simulation of aperiodically reported signaling in LTE uplink
  • Jul 1, 2015
  • Jiri Milos + 2 more

This paper deals with the transmission of channel state information in the Long Term Evolution FDD — uplink mobile standard (Release 8), with focus on the possibility of aperiodical transmission of this information using Physical Uplink Shared Channel (PUSCH). It is necessary to ensure high quality transmission of feedback information about channel conditions between User Equipment (UE) and the Base Station (BS)in both directions. We present a complex analysis of signal processing of the control information transmitted aperiodically via PUSCH and a performance analysis. The simulation results prove robustness of the control information transmitted via PUSCH in various channel conditions and system settings and bring minimal required SNR values for reliable system operation according to 3GPP requirements.

  • Research Article
  • Cite Count Icon 13
  • 10.1109/access.2020.2972064
Non-Orthogonal Random Access and Data Transmission Scheme for Machine-to-Machine Communications in Cellular Networks
  • Jan 1, 2020
  • IEEE Access
  • Yali Wu + 2 more

In order to address the signalling overhead and resource allocation problems of Machine-to-Machine (M2M) communications with non-orthogonal multiple access (NOMA), we propose a hybrid non-orthogonal random access and data transmission (NORA-DT) scheme. A novel design of NORA-DT protocol for M2M communications in cellular networks is firstly proposed. A power back-off scheme is introduced to adjust machine-type communications device (MTCD)'s target arrived power, and a closed-form analytic formula for the relation of MTCD's transmission power is derived. Based on the transmission power relation, the devices are clustered into a set of NOMA clusters. In the hybrid NORA-DT protocol, the cluster center MTCD transmits a extended preamble on behalf of the MTCDs in the same NOMA cluster on the physical random access channel (PRACH) for connection request. Base station (BS) can perfectly detect the preamble collisions in advance and schedules physical uplink shared channel (PUSCH) only to the NOMA clusters without collision. Then the MTCDs in the same NOMA clusters transmit data packets right after preamble transmission on the PUSCH to reduce the signalling overhead. By finding the optimal power allocation, we propose a low-complexity energy efficiency maximization problem for NORA-DT scheme. Due to the relation of MTCD's transmission power, we transform the problem into the function of cluster center MTCD's transmission power and solve it by difference of convex (DC) programming under the maximum transmission power constraints and minimum rate requirements at the MTCDs. A computationally efficient adaptive resource allocation scheme is finally proposed to improve the system throughput and resource usage. The optimal resource allocation between PRACH and PUSCH for any number of MTCDs can be learned by BS in advance, which avoids frequent computation. The analytic model is validated by simulation results. We demonstrate that the proposed NORA-DT scheme can significantly improve the system throughput, resource efficiency and energy efficiency performance.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/cicn.2015.23
An Advanced Power Control Algorithm Based on PHR in LTE-A PUSCH
  • Dec 1, 2015
  • Zhitao Yang + 3 more

Considering the uplink power control scheme worked effectively in compensating the path loss of intra-cell users, mitigating inter-cell interference and improving the cell coverage and system throughput in LTE-A system. An optimization physical uplink shared channel (PUSCH) closed loop power control algorithm based on power headroom report (PHR) is proposed in this paper for the existing ones with the unreasonable processing of PHR and target SINR. In the developed algorithm, the base station firstly calculates the power of each RB after each PH value is received, and then this power has to be filtered and processed to get the closed loop power adjustment. Meanwhile, the update range of target SINR is set. The simulation results show that, compared to the existing power control program based on PHR, the improved one can adjust the transmit power value better and gain the higher data throughput of cell-edge user.

  • Research Article
  • 10.1587/transfun.e96.a.2106
Experiments on Asymmetric Carrier Aggregation Associated with Control Signaling Reception Quality in LTE-Advanced
  • Jan 1, 2013
  • IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
  • Keisuke Saito + 5 more

LTE-Advanced supports asymmetric carrier aggregation (CA) to achieve flexible bandwidth allocation by applying different numbers of component carriers (CCs) between the downlink and uplink. This paper experimentally clarifies the achievable downlink throughput performance when uplink control information (UCI) feedback mechanism using the physical uplink shared channel (PUSCH), which enables minimization of the UCI overhead while maintaining the required reception quality, is applied in asymmetric CA. The laboratory experimental results show that the stable reception quality control of the channel quality information (CQI) with the target block error rate (BLER) of 10-1 to 10-2 is achieved irrespective of the average received signal-to-noise power ratio (SNR) when the control offset parameter of approximately 1.25 is used. We also show that the achievable downlink throughput when the CQI error is considered is almost the same as that in no CQI error case. Furthermore, based on the experimental results in a real field environment, a suburban area of Yokosuka city in Japan, we confirm stable adaptive modulation and coding (AMC) operation including target BLER control of the CQI on the PUSCH in asymmetric CA. The field experimental results also show that when CA with 5 CCs (90-MHz bandwidth) and 2-by-2 rank-2 multiple-output multiple-input (MIMO) multiplexing are employed in the downlink, the peak throughput of approximately 640Mbps is achieved even considering the CQI error.

  • Conference Article
  • 10.1109/icwoc52624.2021.9530217
Reconnaissance and Experiment on 5G-SA Communication Terminal Capability and Identity Information
  • Jun 4, 2021
  • Cong Wu + 2 more

With the rapid development of mobile communication technology, the reconnaissance on terminal capability and identity information is not only an important guarantee to maintain the normal order of mobile communication, but also an essential means to ensure the electromagnetic space security. According to the characteristics of 5G mobile communication terminal’s transporting capability and identity information, the smart jamming is first used to make the target terminal away from the 5G network, and then the jamming is turned off at once. Next the terminal will return to the 5G network. Through the time-frequency matching detection method, interactive signals of random access process and network registration between the terminal and the base station are quickly captured in this process, and the scheduling information in Physical Downlink Control Channel (PDCCH) and the capability and identity information in Physical Uplink Shared Channel (PUSCH) are demodulated and decoded under non-cooperative conditions. Finally, the experiment is carried out on the actual 5G communication terminal of China Telecom. The capability and identity information of this terminal are extracted successfully in the Stand Alone (SA) mode, which verifies the effectiveness and correctness of the method. This is a significant technical foundation for the subsequent development on the 5G terminal control equipment.

  • Conference Article
  • Cite Count Icon 9
  • 10.1109/ahs.2010.5546233
Adaptive multicore scheduling for the LTE uplink
  • Jun 1, 2010
  • Maxime Pelcat + 2 more

The Long Term Evolution (LTE) is the next generation cellular system of 3GPP, where every subframe (1 millisecond duration), a base station receives information from up to one hundred users. Multicore heterogeneous embedded systems with Digital Signal Processors (DSP) and coprocessors are power efficient solutions which decode the LTE uplink signals and encode the downlink LTE signals in base stations. The LTE Physical Uplink Shared Channel (PUSCH) uses a dynamic algorithm, as its multicore scheduling must be adapted every subframe to the number of transmitting users and to the data rate of the services they require. To solve this particular issue of the dynamic deployment while maintaining low latency, one approach is to find efficient on-the-fly solutions using techniques such as graph generation and scheduling. This approach is opposed to a fully static scheduling of predefined cases, approach currently used in the UMTS deployments. We show that the fully static approach is not suitable for the LTE PUSCH and that present DSP cores are powerful enough to recompute an efficient adaptive schedule for the application most complex cases in real-time.

  • Conference Article
  • Cite Count Icon 8
  • 10.1109/icct50939.2020.9295747
5G NR Uplink Coverage Enhancement Based on DMRS Bundling and Multi-slot Transmission
  • Oct 28, 2020
  • Zhiliang Guo + 2 more

With the deployment of fifth-generation base stations, the improvement of cell edge coverage has become an important issue that needs to be solved. Compared with the downlink coverage, the uplink coverage limitation is more serious. Through the simulation analysis of the coverage performance of physical uplink shared channel (PUSCH), physical uplink control channel (PUCCH), and physical random access channel (PRACH), it is found that the coverage of PUSCH is the most severely restricted. This article analyzes the impact of the demodulation reference signal (DMRS) density and uplink waveform on PUSCH coverage. We find that adding DMRS in the same slot can improve the performance of channel estimation. Specifically, we propose to use DMRS bundling with the same or coherent DMRS being sent in multiple time slots for coverage enhancement. The receiver performs joint channel estimation on the DMRS in multiple time slots to improve the accuracy of channel estimation and enhance the coverage. Besides, a multi-slot repetitive transmission scheme is proposed, where the same data is repeatedly transmitted in multiple slots. Joint channel estimation and independent channel estimation are performed for this scheme separately. Our simulation results show that the proposed schemes can improve the coverage effectively.

  • Conference Article
  • Cite Count Icon 5
  • 10.1109/icdsp.2015.7251870
EPUMA: A processor architecture for future DSP
  • Jan 1, 2015
  • Andreas Karlsson + 2 more

Since the breakdown of Dennard scaling the primary design goal for processor designs has shifted from increasing performance to increasing performance per Watt. The ePUMA platform is a flexible and configurable DSP platform that tries to address many of the problems with traditional DSP designs, to increase performance, but use less power. We trade the flexibility of traditional VLIW DSP designs for a simpler single instruction issue scheme and instead make sure that each instruction can perform more work. Multi-cycle instructions can operate directly on vectors and matrices in memory and the datapaths implement common DSP subgraphs directly in hardware, for high compute through-put. Memory bottlenecks, that are common in other architectures, are handled with flexible LUT-based multi-bank memory addressing and memory parallelism. A major contributor to energy consumption, data movement, is reduced by using heterogeneous interconnect and clustering compute resources around local memories for simple data sharing. To evaluate ePUMA we have implemented the majority of the kernel library from a commercial VLIW DSP manufacturer for comparison. Our results not only show good performance, but also an order of magnitude increase in energy- and area efficiency. In addition, the kernel code size is reduced by 91% on average compared to the VLIW DSP. These benefits makes ePUMA an attractive solution for future DSP.

  • Conference Article
  • Cite Count Icon 9
  • 10.1109/padsw.2014.7097922
RBPP: A row based DRAM page policy for the many-core era
  • Dec 1, 2014
  • Xiaowei Shen + 4 more

Memory requests in many-core systems are interleaved with each other and the locality of many-core systems decreases heavily. Page policies in traditional single core systems are not effective when it comes to many-core systems, because the open-page policy needs much locality of memory requests and the close-page policy takes no advantage of the remaining locality of many-core systems. There are some related memory page management policies, but their high complexity makes them unsuitable to many-core systems. They either need too much modification in operating systems or have large area and power overhead. To overcome these shortcomings of current page policies, in this paper, we propose the row based page policy, that is, RBPP, for the many-core systems, which tracks the row addresses of memory requests to each bank and uses row addresses as the indicator to decide whether or not to close the row buffer when the active memory request finished. We evaluate the proposed RBPP via Gem5 and DRAMSim2, and the results show that row based page policy can decrease the average memory latency by 14.7% and 4.0% over the open-page policy and the close-page policy, respectively. And the area overhead of row based page policy is decreased by 91.4 % and 91.5% over access based page policy and two-level predictor page policy, respectively.

  • Research Article
  • 10.1049/iet-cdt.2018.5015
Soft‐error reliable architecture for future microprocessors
  • Mar 5, 2019
  • IET Computers & Digital Techniques
  • Shoba Gopalakrishnan + 1 more

The transient error is the failure of the device due to transient hardware faults caused by high-energy particles like neutron and alpha particle strikes. In this study, the authors propose two schemes of fault-tolerant architecture. The first scheme is a hardware-based solution called REMO that combines the best features of space and time redundancy. REMO provides very high fault coverage with minimum overheads in performance, power and area. The second scheme, REMORA combines the best features of hardware and software approaches of fault tolerance. The persistent issue of unprotected code which exists in software approaches is eliminated in this proposal. Simulation results from a SPEC2006 benchmark suite indicate, REMO incurs an increase in the area of about 6%, power overhead is 9% in spite of redundant execution and a negligible performance penalty during a fault-free run. In REMORA, performance degradation increases to 12%. The code size inflation is close to 12%. This is due to the additional signature instructions inserted into the application program. In this study, the authors have explored the possibility of eliminating this penalty by embedding the signatures in control flow instructions. The power and area overhead of REMORA is on par with REMO.

  • Research Article
  • Cite Count Icon 1
  • 10.1109/access.2021.3064321
Small-Cell Assisted Group Paging for Massive MTC in LTE Networks: Design and Analysis
  • Jan 1, 2021
  • IEEE Access
  • Anh-Tuan H Bui + 3 more

Long-Term Evolution cellular networks are the main enabler for the massive Machine-Type Communications service and therefore must handle a large number of Machine-Type Devices (MTDs). To control the number of devices allowed to contend on the Physical Random Access Channel (PRACH), the group paging scheme that divides the MTDs into smaller groups and lets the network sequentially trigger the groups has been studied. However, as the number of PRACH preambles is limited, a group’s size must be kept relatively small compared to the MTD population. This paper exploits the possibility that a significant portion of the MTDs is also covered by densely deployed small-cells such that a Small-cell Base Station (SBS) may act as a representative for its MTDs during the preamble transmission step to reduce the load on PRACH. Once the SBS succeeds, its MTDs then contend locally to send their own signaling messages on the corresponding reserved uplink resources. Computer simulations show that the manageable group size can be significantly increased at a reasonable cost on the Physical Uplink Shared Channel. A theoretical model to quickly predict the effect of the ratio of MTDs that are under the coverage of the SBSs is also derived and verified.

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/radioelek.2014.6828430
Simulation of UCI transmission via PUCCH in LTE uplink
  • Apr 1, 2014
  • Jiri Milos + 1 more

The Long Term Evolution (LTE) cellular network in uplink has a resource limitation for transmitting uplink signalling information. There is no possibility to transmit Physical Uplink Shared Channel (PUSCH) and Physical Uplink Control Channel (PUCCH) simultaneously in the same subframe. PUCCH is intended for a large number of user equipment and a short Uplink Control Information (UCI) codeword. The base station expects PUCCH data from different UE in the same set of resources. In this paper, we present a description of PUCCH signal processing and the developed MATLAB link level LTE uplink control channel simulator. Results from a complete PUCCH performance analysis for different PUCCH payload in the AWGN channel using receive diversity is included.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.