The circuit structure optimizationed with the traditional method is often difficult to meet the complex and changeable design requirements. In this paper the A3C algorithm has been applied to integrate strategy learning and value learning for the circuit structure optimization. This integration can facilitate continuous interaction with the environment, enabling automatic adjustment of circuit structures to meet the complex design requirements. Gain, bandwidth, latency, and power consumption have been set as the optimization objectives, and the actions of the intelligent agent, which include adding, deleting, modifying connection lines, and adjusting component parameters have been introduced in detail. Once the Actor and Critic networks have been established, multiple agents can operate concurrently, translating optimization objectives into reward signals and providing direction and motivation for agent learning. Then with the proposed method, the circuit structure of one switch audio power amplifier has been designed in a simulation environment. The structure optimization results demonstrated that the gain can reach 78.4 dB at convergence of the A3C algorithm, while the bandwidth can reach 156.2 MHz at convergence, and both the circuit delay and power consumption have been reduced significantly. Obviously, the application of the A3C algorithm can effectively optimize the circuit structures through offering more flexible and efficient solutions.