ABSTRACTVideo encoding based on novel HEVC standard is an extremely computationally expensive process and achieving efficient encoding requires intelligent utilization of all available resources, from both software and hardware perspective. Profiling and analysis of the encoding process identified Discrete cosine transform (DCT) as one of the key kernels that consume most of the time in the application's runtime. Therefore, high-throughput, fully-pipelined hardware accelerator was designed in FPGA and integrated into MANGO platform. MANGO platform is heterogeneous HPC system that consists of different types of nodes, from general purpose nodes (GN) to heterogeneous nodes (HN). While executing specific kernels on GN nodes is a straight-forward process, executing kernels on accelerator-based HNs is a more complex procedure and requires specific integration to successfully exploit heterogeneous architecture. This paper presents performance-efficient integration of DCT hardware accelerator in MANGO platform, focusing on the performance of the encoder while maintaining coding efficiency and video quality of the encoded bitstream. Several approaches were considered, tested and compared; from the standalone integration where series of single tasks were offloaded to the DCT accelerator, to more complex solutions based on smart buffer utilization.