Aiming at the problems faced by the connection management module in single packet processing in intelligent city security video retrieval, we firstly propose the traffic locality quantization index based on the traffic characteristics of the backbone link to quantitatively analyze the traffic locality characteristics in the backbone link. And then, a key-frame abstraction and retrieval of videos based on deep learning is proposed to improve the efficiency and accuracy of video retrieval, where an adaptive key-frame selection algorithm is designed and the existing convolutional neural network framework is used to extract the features of key-frames, and unsupervised, semi-supervised and supervised retraining models are designed to improve the effectiveness of the feature extraction of the convolutional neural network and the accuracy of the video retrieval. Experimental results based on the public video datasets show that our proposed key-frame image retrieval model realizes a good precision for key-frame representation, and achieves high accuracy and efficiency for video retrieval.