Primary user (PU) signal detection or classification is a critical component of cognitive radio (CR) related wireless communication applications. In CR, the PU detection methods are mostly based on statistical models, and their detection performance heavily relies on the accuracy of assumed models. In this paper, we design a novel detector, dubbed as PU-Net, that dynamically learns the PU activity patterns in a cognitive 5G smart city, where a network of unmanned aerial vehicles (UAVs) is deployed as flying base stations to serve the Internet-of-Things (IoT) users. Unlike the traditional schemes, the PU-Net is free from signal-noise model assumptions and is leveraged through deep residual learning integrated with atrous spatial pyramid pooling (ASPP) to sense the PU's transmitted signal patterns in the network. The PU-Net detects and classifies the active and idle PU states by exploiting the multilevel spatial-temporal features in the signal and noise frames. The proposed model is trained using locally synthesized Rayleigh channel-impaired data with large variability of modulated signals and different noise floor regimes. Additionally, the PU-Net model is blind-tested and evaluated on real-world over-the-air signals and with variable-length frames and varying channel effects at secondary users (SUs). With extensive experiments, it is shown that PU-Net outperforms other benchmark detectors, obtaining an accuracy of 0.9974, with 0.9978 recall and 0.9970 precision in detecting and classifying the PU transmitted signal patterns. Correspondingly, the proposed PU-Net can be adopted for IoT/UAV-assisted communication systems in optimizing spectrum efficiency and resolving the coexistence issues in 5G and beyond networks.