Cognitive radio (CR) systems have emerged as effective tools for improving spectrum efficiency and meeting the growing demands of communication. This study focuses on a flexible CR system based on opportunistic spectrum access technology, which enables secondary networks to efficiently utilize unoccupied spectrum resources for information transmission by actively sensing the spectrum utilization of primary networks. Specifically, we introduce unmanned aerial vehicles (UAV) technology into the CR system to further enhance its flexibility and adaptability, which enables the transmission efficiency of low-altitude UAV networks. In this CR system, UAVs are employed for more flexible spectrum management. The objective of this research is to maximize the average achievable rate of SUs by jointly optimizing the trajectories of secondary UAV, the trajectories of primary UAV, the beamforming of secondary UAV, subchannel allocation and sensing time. To achieve this goal, we employ deep reinforcement learning (DRL) algorithms to optimize these variables. Compared to traditional optimization algorithms, DRL algorithms not only have lower computational complexity but also achieve faster convergence. To address the mixed-action space problem, we propose a Dueling DQN-Soft Actor Critic algorithm. Simulation results demonstrate that the proposed approach in this paper significantly enhances the performance of the CR system compared to traditional baseline schemes. This is manifested in higher spectrum efficiency and data transmission rates, while minimizing interference with the primary network. This innovative research combines drone technology and DRL algorithms, bringing new opportunities and challenges to the future development of cognitive communication systems.