Human Parsing Research Articles

Interactive segmentation pursues generating high-quality pixel-level predictions with a few user-provided clicks, which is gaining attention for its convenience in segmentation data annotation. Users are allowed to iteratively refine the prediction by adding clicks until the result is satisfactory. Existing interactive methods usually transform the clicks into a set of localization maps by Euclidian distance computation or RGB texture extraction to guide the segmentation, which makes the click transformation a core module in interactive segmentation networks. However, when adopted in human images where large poses, occlusions, and bad illuminations are prevailing, prior transformation methods tend to cause uncorrectable overlapping across localization maps, <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">i.e. , one click corresponds to multiple transformed values at the same position in different localization map channels, which are difficult to form a good match among human parts and limit the interaction efficiency. Furthermore, the inappropriately transformed information is hard to be refined with the static transformation manner, <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">i.e. , based on the fixed formulas / RGB textures, which is out of tune with the dynamically refined interaction process. Hence, we design a dynamic transformation scheme for interactive human parsing (IHP) named Dynamic Interaction Dilation Net ( <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">DID-Net ), which serves as an initial attempt to break the limitations of static transformation while capturing long-range dependencies of clicks within each human part. Specifically, we construct a Dynamic Dilation Module ( <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">DD-Module ) to dilate clicks radially in several directions assisted by human body edge detection. The continually refined edges guide to improve the dilation quality in each interaction iteration, thereby better fitting user intention. Furthermore, we propose an Adaptive Interaction Excitation Block ( <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">AIE-Block ) to exploit potential semantic clues buried in the dilated clicks and emphasize semantic expression for each human part by feature recalibration. Our DID-Net achieves state-of-the-art performance on 3 public human parsing benchmarks.

Read full abstract

Multiple human parsing (MHP) is typically treated as two sub-tasks, i.e., instance separation and body part segmentation. Existing methods usually tackle the sub-tasks by adopting a two-stage strategy, which regards MHP as an ROI-based (i.e., detect-then-segment) or grouping-based (i.e., segment-then-grouping) paradigm. However, the strong dependence between the two sub-tasks limits the potential of an MHP method, since it often requires qualified prior predictions. Besides, isolated models responsible for the two sub-tasks bring a significant computational burden. Unlike existing methods, we regard MHP as a hierarchical set prediction problem and handle two sub-tasks using several landmarks of body parts. Motivated by this, we propose a novel multiple human parser with representative sets, termed ReSParser. In ReSParser, several landmarks of body parts are hierarchically estimated, resulting in coarse-to-fine representative sets. After that, each representative set is adaptively responsible for segmenting pixels into semantically consistent regions belonging to the corresponding person. In such a manner, the ReSParser simultaneously addresses two sub-tasks in a fully convolutional fashion, thus eliminating the dependence between two sub-tasks and significantly alleviating computational complexity. Extensive experiments on two challenging benchmarks demonstrate that our proposed ReSParser is an efficient framework with a superior parsing performance, which significantly outperforms that of other ROI-free yet grouping-free methods. Besides, it achieves competitive results to that of the best two-stage methods such as RP-RCNN, but requires a much lower inference time, showing a good precision-speed trade-off. Code and models are publicly available <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/JosonChan1998/RepParser</uri> . We hope the ReSParser serves as a new baseline for multiple human parsing research in the future.

Read full abstract

Human Parsing Research Articles

Related Topics

Articles published on Human Parsing

Data augmentation in human-centric vision

Knowledge enhanced multi-task learning for simultaneous optimization of human parsing and pose estimation

Learning differentiable categorical regions with Gumbel-Softmax for person re-identification

FCGNet: Foreground and Class Guided Network for human parsing

Explore human parsing modality for action recognition

From Simple to Complex Scenes: Learning Robust Feature Representations for Accurate Human Parsing.

WNet: A dual‐encoded multi‐human parsing network

Modality adaptation via feature difference learning for depth human parsing

A multitask tensor-based relation network for cloth-changing person re-identification

Semantically enhanced attention map‐driven occluded person re‐identification

CycleVTON: A Cycle Mapping Framework for Parser-Free Virtual Try-On

Deep Learning Technique for Human Parsing: A Survey and Outlook

Prior based Pyramid Residual Clique Network for human body image super-resolution

Causality and signalling of garden-path sentences.

Reducing vulnerable internal feature correlations to enhance efficient topological structure parsing

Toward Accurate Human Parsing Through Edge Guided Diffusion.

CPI-Parser: Integrating Causal Properties Into Multiple Human Parsing.

Dynamic Interaction Dilation for Interactive Human Parsing

ReSParser: Fully Convolutional Multiple Human Parsing With Representative Sets

Prior-structure Driven Weakly-supervised Learning for Fine-grained Human Parsing

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Human Parsing Research Articles

Related Topics

Articles published on Human Parsing

Data augmentation in human-centric vision

Knowledge enhanced multi-task learning for simultaneous optimization of human parsing and pose estimation

Learning differentiable categorical regions with Gumbel-Softmax for person re-identification

FCGNet: Foreground and Class Guided Network for human parsing

Explore human parsing modality for action recognition

From Simple to Complex Scenes: Learning Robust Feature Representations for Accurate Human Parsing.

WNet: A dual‐encoded multi‐human parsing network

Modality adaptation via feature difference learning for depth human parsing

A multitask tensor-based relation network for cloth-changing person re-identification

Semantically enhanced attention map‐driven occluded person re‐identification

CycleVTON: A Cycle Mapping Framework for Parser-Free Virtual Try-On

Deep Learning Technique for Human Parsing: A Survey and Outlook

Prior based Pyramid Residual Clique Network for human body image super-resolution

Causality and signalling of garden-path sentences.

Reducing vulnerable internal feature correlations to enhance efficient topological structure parsing

Toward Accurate Human Parsing Through Edge Guided Diffusion.

CPI-Parser: Integrating Causal Properties Into Multiple Human Parsing.

Dynamic Interaction Dilation for Interactive Human Parsing

ReSParser: Fully Convolutional Multiple Human Parsing With Representative Sets

Prior-structure Driven Weakly-supervised Learning for Fine-grained Human Parsing