Masked Regions Research Articles

Facial image inpainting is a challenging problem as it requires generating new pixels that include semantic information for masked key components in a face, e.g., eyes and nose. Recently, remarkable methods have been proposed in this field. Most of these approaches use encoder–decoder architectures and have different limitations such as allowing unique results for a given image and a particular mask. Alternatively, some optimization-based approaches generate promising results using different masks with generator networks. However, these approaches are computationally more expensive. In this paper, we propose an efficient solution to the facial image painting problem using the Cyclic Reverse Generator (CRG) architecture, which provides an encoder-generator model. We use the encoder to embed a given image to the generator space and incrementally inpaint the masked regions until a plausible image is generated; we trained a discriminator model to assess the quality of the generated images during the iterations and determine the convergence. After the generation process, for the post-processing, we utilize a Unet model that we trained specifically for this task to remedy the artifacts close to the mask boundaries. We empirically observed that even in the absence of important facial features, the encoder model is capable of embedding images in semantically rich regions in the latent space, utilizing the surrounding context in the images. Cultivating the feedback loop between the encoder and generator gradually improves the missing content in the images in an iterative fashion, and only a few iterations are sufficient to generate realistic content. Since the models are not trained for particular mask types, our method allows applying sketch-based inpaintings, using a variety of mask types, and producing multiple and diverse results. We compared our method with the state-of-the-art models both quantitatively and qualitatively, and observed that our method can compete with the other models in all mask types; it is particularly better in images where larger masks are utilized. Our code, dataset and models are available at: https://github.com/yahyadogan72/iterative_facial_image_inpainting.

Read full abstract

The text-based speech editor allows the editing of speech through intuitive cutting, copying, and pasting operations to speed up the process of editing speech. However, the major drawback of current systems is that edited speech often sounds unnatural due to cut-copy-paste operation. In addition, it is not obvious how to synthesize records according to a new word not appearing in the transcript, which often needs the help of text-to-speech (TTS) and voice conversion (VC) technology at the same time. This paper first proposes a novel end-to-end text-based speech editing method called context-aware mask prediction network (CampNet). The model can simulate the text-based speech editing process by randomly masking part of speech and then predicting the masked region by sensing the speech context. It can solve unnatural prosody in the edited region and synthesize the speech corresponding to the unseen words in the transcript. Secondly, for the possible operation of text-based speech editing, we design three text-based operations based on CampNet: deletion, insertion, and replacement. These operations can cover various situations of speech editing. Thirdly, to synthesize the speech corresponding to long text in insertion and replacement operations, a word-level autoregressive generation method is proposed, which can synthesize the speech of arbitrary length text. Fourthly, we propose a speaker adaptation method using only one sentence for CampNet and explore the ability of few-shot learning based on CampNet, which provides a new idea for speech forgery tasks. The subjective and objective experiments<xref ref-type="fn" rid="fn1"><sup>1</sup></xref><fn id="fn1"><label><sup>1</sup></label> Examples of generated speech can be found at <uri>https://hairuo55.github.io/CampNet</uri>. </fn> on VCTK and LibriTTS datasets show that the speech editing results based on CampNet are better than TTS technology, manual editing, and VoCo method (the combination of TTS and VC). We also conduct detailed ablation experiments to explore the effect of the CampNet structure on its performance. Finally, the experiment shows that speaker adaptation with only one sentence can further improve the naturalness of speech editing for one-shot learning.

Read full abstract

Masked Regions Research Articles

Articles published on Masked Regions

Iterative facial image inpainting based on an encoder-generator architecture

AFD-StackGAN: Automatic Mask Generation Network for Face De-Occlusion Using StackGAN.

Pre-training Model Based on Parallel Cross-Modality Fusion Layer.

Unsupervised person re-identification via simultaneous clustering and mask prediction

An Intelligent Deep Learning Enabled Marine Fish Species Detection and Classification Model

Deep Learning-Based Chest CT Image Features in Diagnosis of Lung Cancer.

Underwater Fish Detection and Counting Using Mask Regional Convolutional Neural Network

E2F-GAN: Eyes-to-Face Inpainting via Edge-Aware Coarse-to-Fine GANs

Deep Reinforcement Learning Enabled Smart City Recycling Waste Object Classification

CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing

Deep Learning Based Hand Wrist Segmentation using Mask R-CNN

Intelligent Deep Learning Based Automated Fish Detection Model for UWSN

A Naive and Semantic Approach for Detecting Face Mask Region Based Convolutional Neural Networks (R-CNN)

Breed recognition and estimation of live weight of cattle based on methods of machine learning and computer vision

Implementation and Practice of Deep Learning-Based Instance Segmentation Algorithm for Quantification of Hepatic Fibrosis at Whole Slide Level in Sprague-Dawley Rats.

Automatic vectorization of point symbols on archive maps using deep convolutional neural network

Analyzing fibrous tissue pattern in fibrous dysplasia bone images using deep R-CNN networks for segmentation

Efficient masked face recognition method during the COVID-19 pandemic.

Segmentation of Overlapping Cervical Cells with Mask Region Convolutional Neural Network.

Deep leaf: Mask R-CNN based leaf detection and segmentation from digitized herbarium specimen images

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Masked Regions Research Articles

Articles published on Masked Regions

Iterative facial image inpainting based on an encoder-generator architecture

AFD-StackGAN: Automatic Mask Generation Network for Face De-Occlusion Using StackGAN.

Pre-training Model Based on Parallel Cross-Modality Fusion Layer.

Unsupervised person re-identification via simultaneous clustering and mask prediction

An Intelligent Deep Learning Enabled Marine Fish Species Detection and Classification Model

Deep Learning-Based Chest CT Image Features in Diagnosis of Lung Cancer.

Underwater Fish Detection and Counting Using Mask Regional Convolutional Neural Network

E2F-GAN: Eyes-to-Face Inpainting via Edge-Aware Coarse-to-Fine GANs

Deep Reinforcement Learning Enabled Smart City Recycling Waste Object Classification

CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing

Deep Learning Based Hand Wrist Segmentation using Mask R-CNN

Intelligent Deep Learning Based Automated Fish Detection Model for UWSN

A Naive and Semantic Approach for Detecting Face Mask Region Based Convolutional Neural Networks (R-CNN)

Breed recognition and estimation of live weight of cattle based on methods of machine learning and computer vision

Implementation and Practice of Deep Learning-Based Instance Segmentation Algorithm for Quantification of Hepatic Fibrosis at Whole Slide Level in Sprague-Dawley Rats.

Automatic vectorization of point symbols on archive maps using deep convolutional neural network

Analyzing fibrous tissue pattern in fibrous dysplasia bone images using deep R-CNN networks for segmentation

Efficient masked face recognition method during the COVID-19 pandemic.

Segmentation of Overlapping Cervical Cells with Mask Region Convolutional Neural Network.

Deep leaf: Mask R-CNN based leaf detection and segmentation from digitized herbarium specimen images