Modern software data planes use spin-polling and batch processing mechanisms to significantly improve maximum throughput and forwarding latency. The user-level IO queue-based spin polling mechanism has a higher response speed than the traditional interrupt mechanism. The batch mechanism enables the software data plane to achieve higher throughput by amortizing the IO overhead over multiple packets. However, the software data plane under the spin-polling mechanism keeps running at full speed regardless of the input traffic rate, resulting in significant performance waste. At the same time, we find that the batch processing mechanism does not cope well with different input traffic, mainly reflected in the forwarding latency. The purpose of this paper is to optimize the forwarding latency by leveraging the wasted performance. We propose a forwarding latency optimization scheme for the software data plane based on the spin polling mechanism in this paper. First, we calculate the CPU utilization of the software data plane according to the number of cycles the CPU spends on the valuable task. Then, our scheme controls the Tx queues and dynamically adjusts the output batch size based on the CPU utilization to optimize the forwarding latency of the software data plane. Compared with the original software data plane, the evaluation result shows that the forwarding latency can be reduced by 3.56% to 45% (in a single queue evaluation) and 4.35% to 55.54% (in a multiple queue evaluation).