On-chip data organization and access strategy for spaceborne SAR real-time imaging processor

Shiyu Wang,Hao Lyu,Shengbing Zhang,Xiaoping Huang

doi:10.1051/jnwpu/20213910126

Abstract

Spaceborne SAR(synthetic aperture radar) imaging requires real-time processing of enormous amount of input data with limited power consumption. Designing advanced heterogeneous array processors is an effective way to meet the requirements of power constraints and real-time processing of application systems. To design an efficient SAR imaging processor, the on-chip data organization structure and access strategy are of critical importance. Taking the typical SAR imaging algorithm-chirp scaling algorithm-as the targeted algorithm, this paper analyzes the characteristics of each calculation stage engaged in the SAR imaging process, and extracts the data flow model of SAR imaging, and proposes a storage strategy of cross-region cross-placement and data sorting synchronization execution to ensure FFT/IFFT calculation pipelining parallel operation. The memory wall problem can be alleviated through on-chip multi-level data buffer structure, ensuring the sufficient data providing of the imaging calculation pipeline. Based on this memory organization and access strategy, the SAR imaging pipeline process that effectively supports FFT/IFFT and phase compensation operations is therefore optimized. The processor based on this storage strategy can realize the throughput of up to 115.2 GOPS, and the energy efficiency of up to 254 GOPS/W can be achieved by implementing 65 nm technology. Compared with conventional CPU+GPU acceleration solutions, the performance to power consumption ratio is increased by 63.4 times. The proposed architecture can not only improve the real-time performance, but also reduces the design complexity of the SAR imaging system, which facilitates excellent performance in tailoring and scalability, satisfying the practical needs of different SAR imaging platforms.

Highlights

an effective way to meet the requirements of power constraints and real⁃time processing of application systems
the on⁃chip data organization structure and access strategy are of critical importance
this paper analyzes the characteristics of each calculation stage engaged in the SAR imaging process

Summary

CSA 模型

算法相比,CSA 具有操作过程简单、计算复杂度低、成像效率高的优点。另一方面,CSA 改善了图像的保真度,尤其是相位信息的保存。而且,CSA 可以适应不同的雷达扫描模式,例如聚束式、条带式、扫描 SAR 及滑动聚束式等[5] 。 CSA 的成像原理如图 1 所示。 CSA 可以根据功能分为 3 个模块,也可以根据操作顺序分为 7 个步骤。该成像算法逐步执行, 并且在计算过程中,FFT / IFFT 和相位补偿进行交替操作。如图 1 所示,完整 SAR 成像过程需要进行 4 次傅里叶变换和 3 次相位补偿操作。. 从图 1 中可以看出, CS 算法涉及多次 FFT / IFFT 运算。成像过程中, 首先进行方位向 FFT 运算,将回波数据变换到信号-多普勒域内,在多普勒域内对距离位移曲线进行校正。然后对数据方阵进. IFFT 变换,数据返回二维时域并结束成像过程。每的成像质量非常接近单精度浮点成像质量。但是, 一级的 CS( chirp scaling) 校正计算并行粒度较大, 相位补偿结果需要较高精度, 必须使用浮点算术. 充分挖掘其并行性, 提高计算效率。因此, 基于 1.2 计算特征分析 chirp scaling 算法的 SAR 成像过程,其核心计算内容主要由 FFT / IFFT 处理和因子校正运算组成。并且研究发现,成像过程中执行较低位宽定点 FFT / IFFT 操作,其图像精度损失很小,但可以显著提高. 通过对 FFT 计算过程分析可以得知, N 点 FFT / IFFT 可分解为 2Nlog2N 个实数乘法和 3Nlog2N 个实数加法。表 1 分别列出了 M × N 规模成像过程中各步骤的计算量。. 从表 1 中可以得出 FFT / IFFT 计算量比例 W 如公式 ( 1 ) 所示。对于不同规模的成像矩阵, FFT (IFFT) 的占比 W 略有不同,如表 2 所示。从表 2 可. 以看出,W 值基本都超过 90%,并且随着矩阵增大, 比值 W 最大可达到 95%。因此,针对 FFT / IFFT 计算特征,优化其访存过程,对 FFT / IFFT 操作进行加. 计算过程中,回波数据作为最原始数据进入成像计算过程, 并结合旋转因子进行首次的方位向 FFT 运算,并将计算结果送入第二级运算过程。第二级计算过程引入 CS 校正因子,与第一级计算结果进行相位相乘运算。第二级计算结果流入下一级,第三级进行距离向 FFT 运算,计算过程以及参数引入过程与第一级方位向 FFT 计算过程相似。后续计算过程中循环执行上述数据流动过程,直至最后一级距离向 IFFT 计算结束,完成全部成像计算过程。整个成像算法的计算过程中,计算节点的位置固定,数据流动方向无分支且流动方式保持一致。

SAR 实时成像处理器设计

GHz 100 MHz