Future 6G networks will inherit and develop Network Function Virtualization (NFV) architecture. With the NFV-enabled network architecture, it becomes possible to establish different virtual networks within the same infrastructure, create different Virtual Network Functions (VNFs) in different virtual networks, and form Service Function Chains (SFCs) that meet different service requirements through the orderly combination of VNFs. These SFCs can be deployed to physical entities as needed to provide network functions that support different services. To meet the highly dynamic service requirements in the future 6G Internet of Things (IoT) scenario, the highly flexible and efficient SFC reconfiguration algorithm is the key research direction. Deep-learning-based algorithms have shown their advantages in solving this type of dynamic optimization problem. Considering that the efficiency of the traditional Actor Critic (AC) algorithm is limited, the policy does not directly participate in the value function update. In this paper, we use the Proximal Policy Optimization (PPO) clip function to restrict the difference between the new policy and the old policy, to ensure the stability of the updating process. We combine PPO with AC, and further bring the historical decision information as the network knowledge to offer better initial policies, to accelerate the training speed. We also propose the Knowledge = Assisted Actor Critic Proximal Policy Optimization (KA-ACPPO)-based SFC reconfiguration algorithm to ensure the Quality of Service (QoS) of end-to-end services. Simulation results show that the proposed KA-ACPPO algorithm can effectively reduce computing cost and power consumption.