Abstract

The progress of computer vision has extended the visual recognition tasks from images to the video domain. In recent years, multi-object tracking and segmentation (MOTS), which instantly performs track and segment multiple objects in video frames, has arrested much research attention as it holds valuable potential for emerging technology like autonomous driving. In this work, I proposed an effective framework for the MOTS task on live-streaming video frames, for simultaneously detecting objects, segmenting instances, and tracking move- ments across frames in an end-to-end learnable fashion. The core idea of the proposed flow- guided association (FGA) method was to leverage flow fields to increase pixel-level feature representation and used associative connections across individual task heads (a.k.a. detec- tion head, segmentation head, and tracking head) to facilitate feature sharing. The novel connection architectures allow upper-level tasks (i.e. segmentation and tracking) can accu- rately fire on the region of interests (RoI) for the final prediction. Extensive evaluations on KITTI MOTS dataset, MOTS-Challenge dataset, and new DGL-MOTS dataset indicated that the proposed method was competitive with the best existing method.In addition, the current MOTS dataset inadequately captures the real-world complexity to train a deep-learning algorithm to address the extensive varieties of driving settings. To address the deficiency, I presented the DGL-MOTS Dataset and DG-Labeler for data an- notation for MOTS work. DGL-MOTS included 106,089 instance masks for 1,632 distinct objects in 40 video recordings. My effort exceeded the state-of-the-art KITTI MOTS dataset in terms of dataset scale, object density and variations, and scene diversity. Results on ex- tensive cross-dataset evaluations indicated significant performance improvements for several state-of-the-art methods trained on our DGL-MOTS dataset. I believe the FGA algorithm and the DGL-MOTS Dataset hold valuable potential to facilitate the progress of MOTS studies for autonomous driving.The progress of computer vision has extended the visual recognition tasks from images to the video domain. In recent years, multi-object tracking and segmentation (MOTS), which instantly performs track and segment multiple objects in video frames, has arrested much research attention as it holds valuable potential for emerging technology like autonomous driving. In this work, I proposed an effective framework for the MOTS task on live-streaming video frames, for simultaneously detecting objects, segmenting instances, and tracking movements across frames in an end-to-end learnable fashion. The core idea of the proposed flow-guided association (FGA) method was to leverage flow fields to increase pixel-level feature representation and used associative connections across individual task heads (a.k.a. detection head, segmentation head, and tracking head) to facilitate feature sharing. The novel connection architectures allow upper-level tasks (i.e. segmentation and tracking) can accurately fire on the region of interests (RoI) for the final prediction. Extensive evaluations on KITTI MOTS dataset, MOTS-Challenge dataset, and new DGL-MOTS dataset indicated that the proposed method was competitive with the best existing method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.