This paper presents an accelerated implementation of the Vector Directional Filter (VDF) on a Field Programmable Gate Array (FPGA) for real-time denoising of color images. The VDF effectively suppresses noise while preserving edges and fine details, making it ideal for a range of applications such as satellite and multispectral biomedical imaging. However, the filter's high computational complexity poses challenges for real-time processing. Existing solutions either fail to meet real-time execution requirements or compromise image quality through hardware implementation approximations. To overcome these challenges, we first model the VDF using C/C++ programming, and subsequently design an efficient floating-point hardware architecture employing the High-Level Synthesis (HLS) flow. Optimal directives are selected using the Xilinx Vivado HLS tool. The VDF architecture is then integrated as a coprocessor with the Cortex-A53 hardcore processor in the XCZU9EG FPGA. To enhance data bandwidth between software and hardware components, three Direct Memory Access (DMAs) units are utilized to transfer three image lines in parallel. Furthermore, internal memory is implemented on the XCZU9EG FPGA, providing increased flexibility for managing the restored image. The VDF Software/Hardware (SW/HW) design's robustness and accuracy are validated through experimental studies on the ZCU102 board. Our design accelerates the filtering process by 21 times, maintaining visual quality and effectively removing noise from color images compared to the VDF SW design. Additionally, our solution outperforms existing approaches in terms of filtered image quality and processing time, showing a 24% improvement in the worst-case scenario.