In this paper, a self-learning control scheme is proposed for the infinite horizon optimal control of affine nonlinear systems based on the action dependent heuristic dynamic programming algorithm. The policy iteration technique is introduced to derive the optimal control policy with feasibility and convergence analysis. It shows that the “greedy” control action for each state is uniquely existent, the learned control policy after each policy iteration is admissible, and the optimal control policy is able to be obtained. Two three-layer perceptron neural networks are employed to implement the scheme. The critic network is trained by a novel rule to conform to the Bellman equation, and the action network is trained to yield a better control policy. Both training processes alternate until the optimal control policy is achieved. Two simulation examples are provided to validate the effectiveness of the approach.