Abstract
Human segmentation and tracking often use the outcome of person detection in the video. Thus, the results of segmentation and tracking depend heavily on human detection results in the video. With the advent of Convolutional Neural Networks (CNNs), there are excellent results in this field. Segmentation and tracking of the person in the video have significant applications in monitoring and estimating human pose in 2D images and 3D space. In this paper, we performed a survey of many studies, methods, datasets, and results for human segmentation and tracking in video. We also touch upon detecting persons as it affects the results of human segmentation and human tracking. The survey is performed in great detail up to source code paths. The MADS (Martial Arts, Dancing and Sports) dataset comprises fast and complex activities. It has been published for the task of estimating human posture. However, before determining the human pose, the person needs to be detected as a segment in the video. Moreover, in the paper, we publish a mask dataset to evaluate the segmentation and tracking of people in the video. In our MASK MADS dataset, we have prepared 28 k mask images. We also evaluated the MADS dataset for segmenting and tracking people in the video with many recently published CNNs methods.
Highlights
Human segmentation and tracking in the video are two crucial problems in computer vision
Object detection in images and videos is the first operation applied in computer vision pipelines such as object segmentation, object identification, or object localization
Some results of human detection in images of MOTChellange dataset before human tracking based on the Convolutional Neural Networks (CNNs) are shown in SNN is a human tracking evaluation based on the Siamese Neural Network
Summary
Human segmentation and tracking in the video are two crucial problems in computer vision. Segmentation is the process of separating human data from other data in a complex scene of an image. This problem is widely applied in recognizing the activities of humans in the video. Human tracking extracts the person’s position during the video and is applied in many tasks such as monitoring and surveillance. The MADS dataset is a benchmark dataset for evaluating human pose estimation. This dataset includes activities in traditional martial arts (tai-chi and karate), dancing (hip-hop and jazz), and sports (basketball, volleyball, football, rugby, tennis, badminton)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.