The purpose of the paper is to propose a new implementation of the PID (proportional–integral–derivative) algorithm in digital hardware. The proposed structure is optimized for cost. It follows a serialized, rather than parallel, scheme. It uses only one arithmetic block, performing the multiply-and-add operation. The calculations are carried out in a sequentially cyclic manner. The proposed circuit operates on standard single-precision (32-bit) floating-point numbers. It implements an extended PID formula, containing a non-ideal derivative component, and weighting coefficients, which enable reducing the influence of setpoint changes in the proportional and derivative components. The circuit was implemented in a Cyclone V FPGA (Field-Programmable Gate Array) device from Intel, Santa Clara, CA, USA. The proper operation of the circuit was verified in a simulation. For the specific implementation, which is reported in the paper, the sampling period of 516 ns was obtained, which means that the proposed solution is comparable in terms of speed with other hardware implementations of the PID algorithm operating on single-precision floating-point numbers. However, the presented solution is much more efficient in terms of cost. It uses 1173 LUT (Look-up Table) blocks, 1026 registers, and 1 DSP (Digital Signal Processing) block, i.e., about 30% of logic resources required by comparable solutions.