The deblocking filter (DF) and the sample adaptive offset (SAO) filter, which aids in enhancing the subjective quality of the image, make up the in-loop filter of the high-efficiency video coding (HEVC) encoder and decoder. The in-loop filter significantly increases the computational load on the HEVC encoder. It is challenging to design an in-loop filter on hardware that can handle intensive computations while using the least amount of on-chip memory, taking external memory traffic and dependencies simultaneously delivering high throughput to support Ultra HD video applications. The proposed design employs the following strategies to address these issues. This work proposes an address generation technique for pipelined horizontal and vertical filtering in DF, that avoids a transpose buffer which otherwise is required. This enables easy pipelining and parallelization thus improving throughput while reducing the on-chip memory utilization. A simplified SAO filter with parallel-pipelined processing is included in the design. These features enable the design to support ultra-HD 7680 × 4320 @ 40 fps video applications. The proposed hardware architecture has a total gate count of 7.73 K LUTs and 2.8 K slice registers, and it is implemented on a 28 nm field programmable gate array (FPGA) platform.