Urban public safety management relies heavily on video surveillance systems, which provide crucial visual data for resolving a wide range of incidents and controlling unlawful activities. Traditional methods for target detection predominantly employ a two-stage approach, focusing on precision in identifying objects such as pedestrians and vehicles. These objects, typically sparse in large-scale, lower-quality surveillance footage, induce considerable redundant computation during the initial processing stage. This redundancy constrains real-time detection capabilities and escalates processing costs. Furthermore, transmitting raw images and videos laden with superfluous information to centralized back-end systems significantly burdens network communications and fails to capitalize on the computational resources available at diverse surveillance nodes. This study introduces DiffRank, a novel preprocessing method for fixed-angle video imagery in urban surveillance. The method strategically generates candidate regions during preprocessing, thereby reducing redundant object detection and improving the efficiency of the detection algorithm. Drawing upon change detection principles, a background feature learning approach utilizing shallow features has been developed. This approach prioritizes learning the characteristics of fixed-area backgrounds over direct background identification. As a result, alterations in ROI are efficiently discerned using computationally efficient shallow features, markedly accelerating the generation of proposed Regions of Interest (ROIs) and diminishing the computational demands for subsequent object detection and classification. Comparative analysis on various public and private datasets illustrates that DiffRank, while maintaining high accuracy, substantially outperforms existing baselines in terms of speed, particularly with larger image sizes (e.g., an improvement exceeding 300 % at 1920×1080 resolution). Moreover, the method demonstrates enhanced robustness compared to baseline methods, efficiently disregarding static targets like mannequins in display windows. The advancements in candidate area preprocessing enable a balanced approach between detection accuracy and overall detection speed, making the algorithm highly applicable for real-time on-site analysis in edge computing scenarios and cloud-edge collaborative computing environments.