Abstract Multi-object tracking (MOT) in crowded scenes presents challenges such as target occlusion and interference from similar objects. The detection models that rely on the target center as positive samples often struggle with noise introduced by ambiguous data annotations. To address these issues, we propose to segregate the target features of distinct frequency tiers by leveraging wavelet decomposition. Features obtained from two-dimensional wavelet decomposition can exhibit orthogonality and complementarity along the horizontal and vertical directions. Low-frequency components are usually related to visible targets, while high-frequency energy often emanates from targets undergoing occlusion, enabling discrimination between multi-layered objectives. Experimental evaluations are conducted on the MOT17 and MOT20 benchmarks (motchallenge.net), which demonstrate that our approach has better competitive performance compared to the current state-of-the-art methods.
Read full abstract