AbstractIn the area of modern intelligent systems, the retrieval process of video objects is still a challenging task because objects are usually affected by object confusion, similar appearance among objects, different posing, small size of objects, and interactions among multiple objects. In order to overcome these challenges, the video object is retrieved based on the trajectory points of the multiple-motion objects. However, if an object is in an occlusion situation, the calculation of trajectory points from the objects is considerably altered. In order to overcome the above challenges, we have proposed a technique of query-specific distance and hybrid tracking model for video object retrieval. To verify the performance of the proposed method, five videos were collected from the CAVIAR dataset. Then, the proposed tracking process was applied with these five videos and the performance was analysed based on various parameters, such as precision, recall, and f-measure. From the results, we can prove that the proposed hybrid model attained a higher f-measure of 76.7% compared to that of other existing tracking models, such as the nearest neighbourhood algorithmic model and spatial-exponential weighted moving average model.