The unmanned aerial vehicle (UAV) city patrol is of great significance in ensuring the safety of residents’ lives and properties, as well as maintaining the normal operation of the city. However, the detection of UAV images faces challenges such as numerous small-scale objects, complex backgrounds, and high requirements for detection speed. In response to these issues, we introduce a Real-time Small Object Detection network in UAV-vision (RTS-Net), tailored for UAV patrols. Initially, we introduce a multiscale feature fusion module (MFFM) designed to augment the expressiveness of features across scales, thereby enhancing the detection of smaller objects. Subsequently, leveraging attention mechanisms, we present the coordinated attention detection module (CADM), which bolsters the detection model’s ability to accurately segregate objects from the background in expansive, complex scenarios. Lastly, a lightweight real-time feature extraction module (RFEM) is crafted to diminish model computational complexity and boost inference speed. On the UAV road patrol image dataset we constructed, our proposed method attains a detection accuracy of 89.9%\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$\\%$$\\end{document} mAP, breaking previous records. It surpasses all prevailing detection methods, particularly for small-scale objects. Simultaneously, it achieves an inference speed of 163.9 FPS. The experimental results show that RTS-Net can satisfy the accurate and efficient detection of ground objects by various different UAV platforms in different complex scenarios.