Pose Regression Network Research Articles

Multi-person pose estimation generally follows top-down and bottom-up paradigms. The top-down paradigm detects all human boxes and then performs single-person pose estimation on each ROI. The bottom-up paradigm locates identity-free keypoints and then groups them into individuals. Both of them use an extra stage to build the relationship between human instance and corresponding keypoints (e.g., human detection in a top-down manner or a grouping process in a bottom-up manner). The extra stage leads to a high computation cost and a redundant two-stage pipeline. To address the above issue, we introduce a fine-grained body representation method. Concretely, the human body is divided into several local parts and each part is represented by an adaptive point. The novel body representation is able to sufficiently encode the diverse pose information and effectively model the relationship between human instance and corresponding keypoints in a single-forward pass. With the proposed body representation, we further introduce a compact single-stage multi-person pose regression network, called AdaptivePose++, which is the extended version of AAAI-22 paper AdaptivePose. During inference, our proposed network only needs a single-step decode operation to estimate the multi-person pose without complex post-processes and refinements. Without any bells and whistles, we achieve the most competitive performance on representative 2D pose estimation benchmarks MS COCO and CrowdPose in terms of accuracy and speed. In particular, AdaptivePose++ outperforms the state-of-the-art SWAHR-W48 and CenterGroup-W48 by 3.2 AP and 1.4 AP on COCO mini-val with faster inference speed. Furthermore, the outstanding performance on 3D pose estimation datasets MuCo-3DHP and MuPoTS-3D further demonstrates its effectiveness and generalizability on 3D scenes.

Read full abstract

Most supervised learning-based pose estimation methods for stacked scenes are trained on massive synthetic datasets. In most cases, the challenge is that the learned network on the training dataset is no longer optimal on the testing dataset. To address this problem, we propose a pose regression network PPR-Net++. It transforms each scene point into a point in the centroid space, followed by a clustering process and a voting process. In the training phase, a mapping function between the network’s critical parameter (i.e., the bandwidth of the clustering algorithm) and the compactness of the centroid distributions is obtained. This function is used to adapt the bandwidth between centroid distributions of two different domains. In addition, to further improve the pose estimation accuracy, the network also predicts the confidence of each point, based on its visibility and pose error. Only the points with high confidence have the right to vote for the final object pose. In experiments, our method is trained on the IPA synthetic dataset and compared with the state-of-the-art algorithm. When tested with the public synthetic Siléane dataset, our method is better in all eight objects, where five of them are improved by more than 5% in average precision (AP). On IPA real dataset, our method outperforms a large margin by 20%. This lays a solid foundation for robot grasping in industrial scenarios. Note to Practitioners—Our work is motivated by industrial product assembly based on robot grasping. The industrial parts are usually manufactured by numerical machines and piled in bins. Our method can estimate the poses of visible parts accurately. A pose of a part includes its centroid and spatial orientations. Combined with a depth camera, this algorithm allows an industrial robot to understand complex stacked scenes. We improve the pose estimation accuracy in order to assemble parts with robot grasping, without an additional pose adjuster. Our network can learn from a synthetic dataset and apply it to real-world data, without a significant accuracy drop. The synthetic dataset can be obtained easily by computer simulation programs, so the training data are sufficient. Experiments demonstrate that our method outperforms the state-of-the-art pose estimation approaches.

Read full abstract

Pose Regression Network Research Articles

Related Topics

Articles published on Pose Regression Network

A Dexterous Hand-Arm Teleoperation System Based on Hand Pose Estimation and Active Vision.

FilterformerPose: Satellite Pose Estimation Using Filterformer.

Structure-guided camera localization for indoor environments

A Compact and Powerful Single-Stage Network for Multi-Person Pose Estimation

DiveNet: Dive Action Localization and Physical Pose Parameter Extraction for High Performance Training

PPR-Net++: Accurate 6-D Pose Estimation in Stacked Scenarios

LiDAR-based localization using universal encoding and memory-aware regression

Improving synthetic 3D model-aided indoor image localization via domain adaptation

Adversarial Pose Regression Network for Pose-Invariant Face Recognitions

INDOOR LIDAR RELOCALIZATION BASED ON DEEP LEARNING USING A 3D MODEL

VLocNet++: Deep Multitask Learning for Semantic Visual Localization and Odometry

SHPR-Net: Deep Semantic Hand Pose Regression From Point Clouds

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Pose Regression Network Research Articles

Related Topics

Articles published on Pose Regression Network

A Dexterous Hand-Arm Teleoperation System Based on Hand Pose Estimation and Active Vision.

FilterformerPose: Satellite Pose Estimation Using Filterformer.

Structure-guided camera localization for indoor environments

A Compact and Powerful Single-Stage Network for Multi-Person Pose Estimation

DiveNet: Dive Action Localization and Physical Pose Parameter Extraction for High Performance Training

PPR-Net++: Accurate 6-D Pose Estimation in Stacked Scenarios

LiDAR-based localization using universal encoding and memory-aware regression

Improving synthetic 3D model-aided indoor image localization via domain adaptation

Adversarial Pose Regression Network for Pose-Invariant Face Recognitions

INDOOR LIDAR RELOCALIZATION BASED ON DEEP LEARNING USING A 3D MODEL

VLocNet++: Deep Multitask Learning for Semantic Visual Localization and Odometry

SHPR-Net: Deep Semantic Hand Pose Regression From Point Clouds