Abstract
The proposed method presents an innovative solution that addresses the challenges of unlabeled data, robust performance in unknown environments, and foreground-background differentiation in object segmentation. The objective of this work is to improve object segmentation tasks with an unannotated pool of data, derive model structure to demystify and educate large AI models by using knowledge distillation method, implementing framework with self-supervised learning analyse impacting and driving factors. Proposed method handles unlabeled data effectively, improving accuracy and generalization in diverse scenarios. It demonstrates remarkable reliability in unknown environments, ensuring consistent and accurate object segmentation in complex visual contexts. By incorporating novel techniques, the approach overcomes the longstanding problem of achieving consistent and non-trivial solutions for object segmentation, offering a comprehensive and effective solution for computer vision and image analysis. The literature review reveals that the DINO model using a ViT-B/8 backbone achieves an impressive 80.1% accuracy in linear regression, while the DINO model with a ViT-B/16 backbone attains the highest k-NN accuracy at 78.3%. The approach integrates deep neural networks for object segmentation with Kalman filtering for non-linear state estimation. Specifically, depth information obtained from point clouds is directly incorporated into the Unscented Kalman filtering, allowing for efficient and accurate performance. A one-parameter procedure is employed to identify and eliminate irregularities in point clouds, significantly improving the performance of segmenting objects. By incorporating self-supervised learning with distillation, the method effectively predicts and addresses the challenge of refining trivial solutions, leading to improved performance in viewpoint and illumination scenarios. Decision-making is deferred until a global view of the entire frame can be established, enabling accurate and consistent object segmentation in subsequent frames. The proposed method, SSL-DM+UKF, outperforms existing approaches in various challenges, including occlusion, viewpoint changes, and illumination variations. It achieves statistical superiority over the baseline SSL-Distillation Model by up to 2.50% in different scenarios. Incorporating depth observations through the Unscented Kalman Filter and using self-supervised learning with knowledge distillation are key factors contributing to its success. SSL-DM+UKF offers a valuable and efficient solution for real-time object segmentation in dynamic environments.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have