This paper proposes a dedicated hardware architecture for the Low-Frequency Non-Separable Transform (LFNST) of the Versatile Video Coding (H.266/VVC) standard. The VVC defines two stages of transformation, where the first stage uses traditional transform types (e.g. DCT-II, DCT-VII and DST-VII), while the secondary transform stage applies the LFNST. The LFNST is used to transform the coefficients that were transformed by the DCT-II in the primary transform, but only those from the residues that came from the intra prediction. The developed LFNST system design exploits the Clock Crossing Domain technique to extract the best relation between performance and area/power. Consequently, the design operates with two clock domains, where the core operates at a four times higher frequency than the primary transform. The ASIC synthesis results for a TSMC 40nm standard-cells library indicate that our design can process UHD 4K videos at 120 frames per second while using an area of 69.68 Kgates, and with a power dissipation of 40.46 mW. When compared with related works, our design presented the lowest power dissipation and energy consumption per sample.
Read full abstract