Textureless Regions Research Articles

The task of semi-supervised video object segmentation (VOS) has been greatly advanced and state-of-the-art performance has been made by dense matching-based methods. The recent methods leverage space-time memory (STM) networks and learn to retrieve relevant information from all available sources, where the past frames with object masks form an external memory and the current frame as the query is segmented using the mask information in the memory. However, when forming the memory and performing matching, these methods only exploit the appearance information while ignoring the motion information. In this paper, we advocate for the return of the motion information and propose a motion uncertainty-aware framework (MUNet) for semi-supervised VOS. First, we propose an implicit method to learn the spatial correspondences between neighboring frames, building upon a correlation cost volume. To handle the challenging cases of occlusion and textureless regions during constructing dense correspondences, we incorporate the uncertainty in dense matching and achieve motion uncertainty-aware feature representation. Second, we introduce a motion-aware spatial attention module to effectively fuse the motion features with the semantic features. Comprehensive experiments on challenging benchmarks show that using a small amount of data and combining it with powerful motion information can bring a significant performance boost. We achieve 76.5%J&F only using DAVIS17 for training22This result is initialized by the Mask-RCNN-ResNet50 weights pre-trained on COCO dataset. By initialization from ResNet50 pre-trained on ImageNet dataset, we can achieve 75.0%J&F, which is still the SOTA performance., which significantly outperforms the SOTA methods under the low-data protocol. The code and supplementary materials will be available at https://npucvr.github.io/MUNet.

Learning-based multi-view stereo (MVS) has by far centered around 3D convolution on cost volumes. Due to the high computation and memory consumption of 3D CNN, the resolution of output depth is often considerably limited. Different from most existing works dedicated to adaptive refinement of cost volumes, we opt to directly optimize the depth value along each camera ray, mimicking the range (depth) finding of a laser scanner. This reduces the MVS problem to ray-based depth optimization which is much more light-weight than full cost volume optimization. In particular, we propose RayMVSNet which learns sequential prediction of a 1D implicit field along each camera ray with the zero-crossing point indicating scene depth. This sequential modeling, conducted based on transformer features, essentially learns the epipolar line search in traditional multi-view stereo. We devise a multi-task learning for better optimization convergence and depth accuracy. We found the monotonicity property of the SDFs along each ray greatly benefits the depth estimation. Our method ranks top on both the DTU and the Tanks & Temples datasets over all previous learning-based methods, achieving an overall reconstruction score of 0.33 mm on DTU and an F-score of 59.48% on Tanks & Temples. It is able to produce high-quality depth estimation and point cloud reconstruction in challenging scenarios such as objects/scenes with non-textured surface, severe occlusion, and highly varying depth range. Further, we propose RayMVSNet++ to enhance contextual feature aggregation for each ray through designing an attentional gating unit to select semantically relevant neighboring rays within the local frustum around that ray. This improves the performance on datasets with more challenging examples (e.g., low-quality images caused by poor lighting conditions or motion blur). RayMVSNet++ achieves state-of-the-art performance on the ScanNet dataset. In particular, it attains an AbsRel of 0.058m and produces accurate results on the two subsets of textureless regions and large depth variation.

Textureless Regions Research Articles

Related Topics

Articles published on Textureless Regions

Indoor Scene Reconstruction with Fine-Grained Details Using Hybrid Representation and Normal Prior Enhancement.

Efficiency-Accuracy Trade-Off in Light Field Estimation with Cost Volume Construction and Aggregation.

TSAR-MVS: Textureless-aware segmentation and correlative refinement guided multi-view stereo

Redefining the Laparoscopic Spatial Sense: AI-Based Intra- and Postoperative Measurement from Stereoimages

Mono‐MVS: textureless‐aware multi‐view stereo assisted by monocular prediction

Light Field Depth Estimation via Stitched Epipolar Plane Images.

DebSDF: Delving into the Details and Bias of Neural Indoor Scene Reconstruction.

Unsupervised Stereo Matching with Surface Normal Assistance for Indoor Depth Estimation.

PanoVLM: Low-Cost and accurate panoramic vision and LiDAR fused mapping

Joint Optimization of Depth and Ego-Motion for Intelligent Autonomous Vehicles

Crafting Monocular Cues and Velocity Guidance for Self-Supervised Multi-Frame Depth Learning

CbwLoss: Constrained Bidirectional Weighted Loss for Self-Supervised Learning of Depth and Pose

Multi-scale foreground-background separation for light field depth estimation with deep convolutional networks

MUNet: Motion uncertainty-aware semi-supervised video object segmentation

A Multiview Stereo Algorithm Based on Image Segmentation Guided Generation of Planar Prior for Textureless Regions of Artificial Scenes

HIPA: Hierarchical Patch Transformer for Single Image Super Resolution.

Hierarchical Belief Propagation on Image Segmentation Pyramid.

RayMVSNet++: Learning Ray-Based 1D Implicit Fields for Accurate Multi-View Stereo.

NeuralRoom

SurRF: Unsupervised Multi-View Stereopsis by Learning Surface Radiance Field.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Textureless Regions Research Articles

Related Topics

Articles published on Textureless Regions

Indoor Scene Reconstruction with Fine-Grained Details Using Hybrid Representation and Normal Prior Enhancement.

Efficiency-Accuracy Trade-Off in Light Field Estimation with Cost Volume Construction and Aggregation.

TSAR-MVS: Textureless-aware segmentation and correlative refinement guided multi-view stereo

Redefining the Laparoscopic Spatial Sense: AI-Based Intra- and Postoperative Measurement from Stereoimages

Mono‐MVS: textureless‐aware multi‐view stereo assisted by monocular prediction

Light Field Depth Estimation via Stitched Epipolar Plane Images.

DebSDF: Delving into the Details and Bias of Neural Indoor Scene Reconstruction.

Unsupervised Stereo Matching with Surface Normal Assistance for Indoor Depth Estimation.

PanoVLM: Low-Cost and accurate panoramic vision and LiDAR fused mapping

Joint Optimization of Depth and Ego-Motion for Intelligent Autonomous Vehicles

Crafting Monocular Cues and Velocity Guidance for Self-Supervised Multi-Frame Depth Learning

CbwLoss: Constrained Bidirectional Weighted Loss for Self-Supervised Learning of Depth and Pose

Multi-scale foreground-background separation for light field depth estimation with deep convolutional networks

MUNet: Motion uncertainty-aware semi-supervised video object segmentation

A Multiview Stereo Algorithm Based on Image Segmentation Guided Generation of Planar Prior for Textureless Regions of Artificial Scenes

HIPA: Hierarchical Patch Transformer for Single Image Super Resolution.

Hierarchical Belief Propagation on Image Segmentation Pyramid.

RayMVSNet++: Learning Ray-Based 1D Implicit Fields for Accurate Multi-View Stereo.

NeuralRoom

SurRF: Unsupervised Multi-View Stereopsis by Learning Surface Radiance Field.