With technological advancement worldwide, the video surveillance market is growing drastically in a versatile field. Monitoring, browsing, and retrieving a specific object in a long video becomes difficult due to the enormous amount of data produced by the surveillance camera. With limitations on human resources and browsing time, there is a need for a new video analytics model to handle more complex tasks, such as object detection and query retrieval. The current approach involves techniques like unsupervised segmentation, multiscale segmentation, and feature-based descriptions. However, these methods often encounter extensive space and time consumption challenges. A solution has been developed for retrieving targeted objects from surveillance videos via user queries, employing a graphical interface for input. Extracting relevant frames based on user-entered text queries is enabled through using YOLOv8 for object detection. Users interact through a graphical user interface deployed on a Jetson Xavier Development board. The system's outcome is a time-efficient and highly accurate automated model for object detection and query retrieval, eliminating human errors associated with manually locating objects in videos upon user queries.