Accurate early classification of elephant flows (elephants) is important for network management and resource optimization. Elephant models, mainly based on the byte count of flows, can always achieve high accuracy, but not in a time-efficient manner. The time efficiency becomes even worse when the flows to be classified are sampled by flow entry timeout over Software-Defined Networks (SDNs) to achieve a better resource efficiency. This paper addresses this situation by combining co-training and Reinforcement Learning (RL) to enable a closed-loop classification approach that divides the entire classification process into episodes, each involving two elephant models. One predicts elephants and is retrained by a selection of flows automatically labeled online by the other. RL is used to formulate a reward function that estimates the values of the possible actions based on the current states of both models and further adjusts the ratio of flows to be labeled in each phase. Extensive evaluation based on real traffic traces shows that the proposed approach can stably predict elephants using the packets received in the first 10% of their lifetime with an accuracy of over 80%, and using only about 10% more control channel bandwidth than the baseline over the evolved SDNs.