Abstract

To address the problem of automatically detecting and removing the mask without user interaction, we present a GAN-based automatic approach for face de-occlusion, called Automatic Mask Generation Network for Face De-occlusion Using Stacked Generative Adversarial Networks (AFD-StackGAN). In this approach, we decompose the problem into two primary stages (i.e., Stage-I Network and Stage-II Network) and employ a separate GAN in both stages. Stage-I Network (Binary Mask Generation Network) automatically creates a binary mask for the masked region in the input images (occluded images). Then, Stage-II Network (Face De-occlusion Network) removes the mask object and synthesizes the damaged region with fine details while retaining the restored face’s appearance and structural consistency. Furthermore, we create a paired synthetic face-occluded dataset using the publicly available CelebA face images to train the proposed model. AFD-StackGAN is evaluated using real-world test images gathered from the Internet. Our extensive experimental results confirm the robustness and efficiency of the proposed model in removing complex mask objects from facial images compared to the previous image manipulation approaches. Additionally, we provide ablation studies for performance comparison between the user-defined mask and auto-defined mask and demonstrate the benefits of refiner networks in the generation process.

Highlights

  • Introduction published maps and institutional affilFace occlusion, a growing trend in recent years worldwide, is one of the leading causes of computer vision problems, such as face recognition, identification, tracking, detection, classification, face parsing, contour extraction, etc., which are challenging to tackle

  • This work automatically eliminates challenging mask objects from the face and synthesizes the damaged area with fine details while holding the restored face’s appearance and structural consistency; This work attempts to alleviate the manual mask selection burden by creating a straightforward method that can intelligently and automatically generate the occluded region’s binary mask in facial images; One potential application of an automatic mask generation network could be a video where mask objects continuously conceal the face’s structural semantics; We experimentally show that the proposed model with an automatically generated mask is more effective than those with manually generated masks for removing mask objects and generating realistic semantics of face images

  • The first row contains input images, the second row features corresponding binary masks generated by the mask generation network, the third row contains refined mask refined by the mask refiner network, and the last two rows show the output of Stage-II Network

Read more

Summary

Object Detection Methods

Object detection is the process of finding various objects in an image. Face occlusion detection aims to detect the facial region occluded by other objects. Several variants of FCN, such as [8,9,10], have been proposed to make it more appropriate for image segmentation tasks All these approaches use a modified version of the classification network (removing its fully connected layers and replacing them with a typical CNN layer) as an encoder to produce a low-resolution image representation. The Se-GAN segmentor network takes an image and visible area as its input and generates the mask of the whole object that has been occluded. Multi-Task GAN (MT-GAN) [14] used an SRN (super-resolution network) to up-scale the small-scale distorted image into the large-scale clear image for better detection. Instead of using these expensive algorithms to detect non-face objects in facial images automatically, we employ a simple encoder-decoder network architecture focusing on mask objects. The encoder-decoder network architecture has three convolution layers for the encoder part and three convolutions (transpose convolution) layers for the decoder part

Object Removal Methods
Our Approach
Stage-I Network
Stage-II Network
Total Loss Function
Experiments
Training and Implementation Details
Competing Methods
Synthetic Generated Dataset
Real-World Generated Dataset
Performance Evaluation Metrics
Results and Comparisons
Results of Stage-I Network
Results of Stage-II Network
Qualitative Comparisons
Quantitative Comparisons
Performance Comparison between Using User-Defined Mask and
Role of Refiner Networks
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.