Abstract
Tracking human movement and interactions in complex environments is a key challenge in computer vision, especially for multi-object tracking. Transformer-based models have shown promise in addressing these challenges due to their capacity to recognize complex patterns across sequences. However, their high computational demands and substantial training data requirements often restrict their real-world applicability. This study aimed to enhance multi-object tracking by introducing a Compact Model Adjustment approach that integrates trainable rank-decomposition matrices within the Transformer architecture. This approach involves freezing the pre-trained model weights and adding trainable low-rank matrices to each layer, substantially reducing the number of parameters that need updating during training. This design allows the model to retain its pre-trained knowledge while efficiently adapting to new tasks, thereby reducing the overall computational load. Additionally, the proposed approach utilizes data from both the current and previous frames to refine object localization and association. Experimental results on the MOT17 benchmark demonstrated that this method achieved a Multiple Object Tracking Accuracy of 71.0, comparable to state-of-the-art techniques while enhancing computational efficiency. This work provides a practical solution for real-world applications in areas such as surveillance, autonomous driving, and sports analytics.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have