Abstract

Intelligent transportation systems pervasively deploy thousands of video cameras. Analyzing live video streams from these cameras is of significant importance to public safety. As streaming video is increasing, it becomes infeasible to have human operators sitting in front of hundreds of screens to catch suspicious activities or detect objects of interests in real-time. Actually, with millions of traffic surveillance cameras installed, video retrieval is more vital than ever. To that end, this article proposes a long video event retrieval algorithm based on superframe segmentation. By detecting the motion amplitude of the long video, a large number of redundant frames can be effectively removed from the long video, thereby reducing the number of frames that need to be calculated subsequently. Then, by using a superframe segmentation algorithm based on feature fusion, the remaining long video is divided into several Segments of Interest (SOIs) which include the video events. Finally, the trained semantic model is used to match the answer generated by the text question, and the result with the highest matching value is considered as the video segment corresponding to the question. Experimental results demonstrate that our proposed long video event retrieval and description method which significantly improves the efficiency and accuracy of semantic description, and significantly reduces the retrieval time.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call