Abstract

Vehicle re-identification involves searching for images of identical vehicles across different cameras. For intelligent traffic control, re-identification of vehicles is very important. Convolutional Neural Networks (CNN) have succeeded in re-identification, but CNN-based methods process only one neighbourhood at a time and information is lost during pooling operation. To mitigate this shortcoming of CNN, We have proposed a novel vehicle re-identification framework (Vehicle ReID) based on vision transformer with gradient accumulation. The training images are split into different overlapped patches, and each patch is flattened into a 1D vector. Positional, camera and view embeddings are added to the patch embeddings and given as input to the vision transformer to generate a global feature. After that, this global feature is fed to three branches: ID, colour and type classification. For ID branch, triplet and cross-entropy losses are used. For colour branch and type branch, only cross-entropy loss is used. Gradient accumulation is employed at the training time to accumulate the gradient during each iteration in an epoch, and the neural network weights get updated only when the number of iterations reaches a predefined step size. This allows the model to work like being trained with a greater batch size without upgrading GPUs. To validate the effectiveness of the proposed framework, mean average precision (mAP), Rank-1, and Rank-5 hit rate have been computed on the VeRi dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call