An episodic unsupervised learning algorithm using the Q-Learning method is developed to learn the optimal shape and shape change policy of a morphing airfoil. Optimality is addressed by reward functions based on airfoil properties such as lift coecient, drag coecient, and moment coecient about the leading edge representing optimal shapes for specified flight conditions. The reinforcement learning as it is applied to morphing is integrated with a computational model of an airfoil. The methodology is demonstrated with numerical examples of a NACA type airfoil that autonomously morphs in two degrees of freedom, thickness and camber, to a shape that corresponds to specified goal requirements. Due to the continuous nature of the thickness and camber of the airfoil, this paper addresses the convergence of the learning algorithm given several action step sizes. Convergence is also analyzed with three candidate policies: 1) a fully random exploration policy, 2) a policy annealing from random exploration to exploitation, and 3) an annealing discount factor in addition to the annealing policy. The results presented in this paper show the inherent dierences in the learned action-value function when the state space discretization, policy, and learning parameters dier. It was found that a policy annealing from fully explorative to almost fully exploitative yielded the highest rate of convergence as compared to the other policies. Also, the coarsest discretization of the state space resulted in convergence of the action-value function in as little as 200 episodes.