Nowadays, unmanned aerial vehicles (UAVs) organized in a flying ad hoc network (FANET) can successfully carry out complex missions. Due to the limitations of these networks, including the lack of infrastructure, wireless communication channels, dynamic topology, and unreliable communication between UAVs, cyberattacks, especially wormholes, weaken the performance of routing schemes. Therefore, maintaining communication security and guaranteeing the quality of service (QoS) are very challenging. In this paper, a novel Q-learning-based secure routing scheme (QSR) is presented for FANETs. QSR seeks to provide a robust defensive system against wormhole attacks, especially wormhole through encapsulation and wormhole through packet relay. QSR includes a secure neighbor discovery process and a Q-learning-based secure routing process. Firstly, each UAV gets information about its neighboring UAVs securely. To secure communication in this process, a local monitoring system is designed to counteract the wormhole attack through packet relay. This system checks data packets exchanged between neighboring UAVs and defines three rules according to the behavior of wormholes. In the second process, UAVs perform a distributed Q-learning-based routing process to counteract the wormhole attack through encapsulation. To reward the safest paths, a reward function is introduced based on five factors, the average one-hop delay, hop count, data loss ratio, packet transmission frequency (PTF), and packet reception frequency (PRF). Finally, the NS2 simulator is applied for implementing QSR and executing different scenarios. The evaluation results show that QSR works better than TOPCM, MNRiRIP, and MNDA in terms of accuracy, malicious node detection rate, data delivery ratio, and data loss ratio. However, it has more delay than TOPCM.