Abstract

We present a novel silhouette extraction algorithm designed for the binary segmentation of swimmers underwater. The intended use of this algorithm is within a 2D-to-3D pipeline for the markerless motion capture of swimmers, a task which has not been achieved satisfactorily, partly due to the absence of silhouette extraction methods that work well on images of swimmers. Our algorithm, FISHnet, was trained on the novel Scylla dataset, which contains 3,100 images (and corresponding hand-traced silhouettes) of swimmers underwater, and achieved a dice score of 0.9712 on its test data. Our algorithm uses a U-Net-like architecture and VGG16 as a backbone. It introduces two novel modules: a modified version of the Semantic Embedding Branch module from ExFuse, which increases the complexity of the features learned by the layers of the encoder; and the Spatial Resolution Enhancer module, which increases the spatial resolution of the features of the decoder before they are skip connected with the features of the encoder. The contribution of these two modules to the performance of our network was marginal, and we attribute this result to the lack of data on which our network was trained. Nevertheless, our model outperformed state-of-the-art silhouette extraction algorithms (namely DeepLabv3+) on Scylla, and it is the first algorithm developed specifically for the task of accurately segmenting the silhouettes of swimmers.

Highlights

  • The ultimate goal of sports motion capture1 is to accurately and automatically reconstruct the 3D locations of the joints of an athlete in motion just from one or more images [2]

  • To investigate how much each element of our network contributed to the overall accuracy achieved, we report in Table 3 an ablation study that shows the dice score achieved when a certain subset of elements was active

  • Accurate markerless motion capture of underwater swimmers has not been achieved yet, mainly because there is no algorithm that can accurately segment the silhouettes of swimmers underwater: traditional algorithms like background subtraction are too noisy, and off-the-shelf pre-trained models fail to generalise to images of swimmers

Read more

Summary

Introduction

The ultimate goal of sports motion capture is to accurately and automatically reconstruct the 3D locations of the joints of an athlete in motion just from one or more images (rather than by using sensors or markers attached to the body of the athlete) [2]. Modern image-based 2D-to-3D methods ( referred to as markerless motion capture systems) extract from each recorded image one of two (or both) types of information: the silhouette of the athlete, and/or the location of the joints in image coordinates [3], [4]. The literature seems to agree on what the best options are to extract the 2D locations of the joints (which are collectively referred to as 2D pose): they can be either digitised manually [4], or extracted automatically by a deep neural network, such as the Stacked Hourglass network [7], [12] or OpenPose [13]. Authors who use silhouettes as inputs must rely on some type of algorithm to automatically extract them from the images. In which the lighting is stable and the contrast between the subject and the background is sharp, silhouette extraction can be performed by recording a reference image of the background (i.e. without the subject present in the scene) and subtracting that reference image from all the images in which the subject

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.