SceneFlow: Synthesizing indoor scenes via geometry-enhanced flow matching

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

SceneFlow: Synthesizing indoor scenes via geometry-enhanced flow matching

Similar Papers
  • PDF Download Icon
  • Research Article
  • Cite Count Icon 5
  • 10.1186/s42492-019-0030-9
Indoor versus outdoor scene recognition for navigation of a micro aerial vehicle using spatial color gist wavelet descriptors
  • Nov 26, 2019
  • Visual Computing for Industry, Biomedicine, and Art
  • Anitha Ganesan + 1 more

In the context of improved navigation for micro aerial vehicles, a new scene recognition visual descriptor, called spatial color gist wavelet descriptor (SCGWD), is proposed. SCGWD was developed by combining proposed Ohta color-GIST wavelet descriptors with census transform histogram (CENTRIST) spatial pyramid representation descriptors for categorizing indoor versus outdoor scenes. A binary and multiclass support vector machine (SVM) classifier with linear and non-linear kernels was used to classify indoor versus outdoor scenes and indoor scenes, respectively. In this paper, we have also discussed the feature extraction methodology of several, state-of-the-art visual descriptors, and four proposed visual descriptors (Ohta color-GIST descriptors, Ohta color-GIST wavelet descriptors, enhanced Ohta color histogram descriptors, and SCGWDs), in terms of experimental perspectives. The proposed enhanced Ohta color histogram descriptors, Ohta color-GIST descriptors, Ohta color-GIST wavelet descriptors, SCGWD, and state-of-the-art visual descriptors were evaluated, using the Indian Institute of Technology Madras Scene Classification Image Database two, an Indoor-Outdoor Dataset, and the Massachusetts Institute of Technology indoor scene classification dataset [(MIT)-67]. Experimental results showed that the indoor versus outdoor scene recognition algorithm, employing SVM with SCGWDs, produced the highest classification rates (CRs)—95.48% and 99.82% using radial basis function kernel (RBF) kernel and 95.29% and 99.45% using linear kernel for the IITM SCID2 and Indoor-Outdoor datasets, respectively. The lowest CRs—2.08% and 4.92%, respectively—were obtained when RBF and linear kernels were used with the MIT-67 dataset. In addition, higher CRs, precision, recall, and area under the receiver operating characteristic curve values were obtained for the proposed SCGWDs, in comparison with state-of-the-art visual descriptors.

  • Research Article
  • 10.1088/1742-6596/1969/1/012001
Vision-based Position estimation and Indoor scene recognition algorithm for Quadrotor Navigation
  • Jul 1, 2021
  • Journal of Physics: Conference Series
  • B Anbarasu + 1 more

In this paper, an effective and simple Grid based vanishing point detection position estimation algorithm and Enhanced GIST descriptors based indoor scene recognition algorithm for navigation of MAV in indoor corridor environment is described. Two different classifiers, k-nearest neighbour classifier and support vector machine is employed for the categorization of indoor scenes into corridor, staircase or room. Indoor scene classification was performed on Dartaset-1. In the training phase of the indoor scene recognition algorithm, GIST, HODMG and Enhanced-GIST feature vectors are extracted for all the indoor training images in the Dataset-1 and indoor scene classifiers are trained for the extracted image feature vectors and assigned image labels of the indoor scenes (corridor-1, staircase-2 and room-3). In the testing phase of the indoor scene recognition algorithm, for each unknown test image frame GIST, HODMG and Enhanced-GIST feature vectors are extracted and the indoor scene classification is performed using a trained scene recognition model. The proposed indoor scene recognition algorithm using SVM with Enhanced GIST descriptors produced high recognition rates of 99.33% compared to the KNN classifiers. After recognizing the indoor scene as corridor, the MAV has to estimate its position based on the detection of vanishing point in the indoor corridor image frames. Experimental results show that the proposed method is suitable for real time operations.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/icpr.2018.8546049
Scalable Monocular SLAM by Fusing and Connecting Line Segments with Inverse Depth Filter
  • Aug 1, 2018
  • Jiyuan Zhang + 2 more

In this paper we propose a fast and robust line-based approach to monocular SLAM. It relies on a novel inverse depth representation of lines capable of tracking line segments in long image consequences. Tracked lines through frames provide crucial directional and positional knowledge for boosting localization performance, and they are more informative in charactering environments than points especially for urban outdoor and indoor scenes. The developed two-parameter inverse depth representation of lines is applicable for Kalman filter to achieve an efficient solver due to its linearity, which has lower computational cost compared to binary descriptors. This filter is also harmonious with inverse depth filter of points, both of which are incorporated under a unified minimization framework to enhance the performance of monocular SLAM. Real world monocular sequences have demonstrated that the proposed SLAM system outperforms the state-of-the-art and produces accurate results in both indoor and outdoor scenes.

  • PDF Download Icon
  • Research Article
  • 10.1007/s41095-022-0299-z
Fuzzy-based indoor scene modeling with differentiated examples
  • May 23, 2023
  • Computational Visual Media
  • Qiang Fu + 4 more

Well-designed indoor scenes incorporate interior design knowledge, which has been an essential prior for most indoor scene modeling methods. However, the layout qualities of indoor scene datasets are often uneven, and most existing data-driven methods do not differentiate indoor scene examples in terms of quality. In this work, we aim to explore an approach that leverages datasets with differentiated indoor scene examples for indoor scene modeling. Our solution conducts subjective evaluations on lightweight datasets having various room configurations and furniture layouts, via pairwise comparisons based on fuzzy set theory. We also develop a system to use such examples to guide indoor scene modeling using user-specified objects. Specifically, we focus on object groups associated with certain human activities, and define room features to encode the relations between the position and direction of an object group and the room configuration. To perform indoor scene modeling, given an empty room, our system first assesses it in terms of the user-specified object groups, and then places associated objects in the room guided by the assessment results. A series of experimental results and comparisons to state-of-the-art indoor scene synthesis methods are presented to validate the usefulness and effectiveness of our approach.

  • Book Chapter
  • Cite Count Icon 2
  • 10.1007/978-3-030-23712-7_13
Structure Reconstruction of Indoor Scene from Terrestrial Laser Scanner
  • Jan 1, 2019
  • Xiaojuan Ning + 4 more

Indoor scene reconstruction from point cloud data provided by Terrestrial laser scanning (TLS) has become an issue of major interest in recent years. However, the raw scanned indoor scene is always complex with severe noise, outliers and incomplete regions, which produces more difficulties for indoor scene modeling. In this paper, we presented an automatic approach to reconstruct the structure of indoor scene from point clouds acquired by registering several scans. Our method first extracts different candidate walls by separating the indoor scene into different planes based on normal variation. Then the boundary of those candidate walls are obtained by projecting them onto 2D planes. We classify the walls into exterior wall and interior wall by clustering. After distinguishing the 3D points belonging to exterior walls, a simple strategy is generated to refine the 3D model of wall structure. The methodology has been tested on three real datasets, which constitute of different varieties of indoor scenes. The results derived reveal that the indoor scene could be correctly extracted and modeled.

  • Book Chapter
  • 10.1007/978-3-030-82565-2_36
Multi-viewpoint Rendering Optimization of Indoor Scene Based on Binocular Vision
  • Jan 1, 2021
  • He Jing

The traditional multi-viewpoint rendering method for indoor scenes has poor lighting and shadow effects, resulting in too dark or bright indoor scene multi-viewpoint rendering. Therefore, an optimization method for indoor scene multi-viewpoint rendering based on binocular vision is proposed. This research plans light and shadow effects based on the visual relationship between point light sources and indoor scenes; sets rendering points based on binocular vision; and renders indoor scenes with multiple viewpoint angles. The simulation experiment results show that compared with the traditional rendering method, the light and shadow effect of the studied method is excellent, and the rendered indoor scene is suitable for light and dark.KeywordsBinocular visionIndoor sceneMulti-viewpoint renderingOptimization method

  • Conference Article
  • Cite Count Icon 4
  • 10.1109/icma.2011.5986286
Fast vision-based object segmentation for natural landmark detection on Indoor Mobile Robot
  • Aug 1, 2011
  • Xiaojie Chai + 2 more

In this paper a new fast vision-based Object Segmentation technique by extracting straight line features from the indoor scenes is proposed. An indoor space scene always contains natural structures like doors, walls, ceilings and floor which have clear straight lines and large homogeneous color surfaces that can be stably detected to form the objects. The objects bounded with lines are very suitable for Indoor Mobile Robot to quickly detect, save as natural landmarks and use in visual SLAM. Compared with the POI (point of interest) features like Harris corner, the line features not only are more robust to changes of scale and illumination, but also can provide more structural information of the indoor environment. This algorithm works in real time and is stable against variation of illumination. The main idea of the method is combining straight lines to form lots of convex polygons. Polygons with homogenous color are kept and adjacent polygons with similar color are merged by a merge test process. A fast line segmentation and fitting method is proposed to improve the line detection efficiency and half edge structure is added to simplify the polygon generation process. Finally experiment results demonstrate the accuracy and robustness of the proposed algorithm in real indoor environments.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 11
  • 10.3390/computers13050121
Indoor Scene Classification through Dual-Stream Deep Learning: A Framework for Improved Scene Understanding in Robotics
  • May 14, 2024
  • Computers
  • Sultan Daud Khan + 1 more

Indoor scene classification plays a pivotal role in enabling social robots to seamlessly adapt to their environments, facilitating effective navigation and interaction within diverse indoor scenes. By accurately characterizing indoor scenes, robots can autonomously tailor their behaviors, making informed decisions to accomplish specific tasks. Traditional methods relying on manually crafted features encounter difficulties when characterizing complex indoor scenes. On the other hand, deep learning models address the shortcomings of traditional methods by autonomously learning hierarchical features from raw images. Despite the success of deep learning models, existing models still struggle to effectively characterize complex indoor scenes. This is because there is high degree of intra-class variability and inter-class similarity within indoor environments. To address this problem, we propose a dual-stream framework that harnesses both global contextual information and local features for enhanced recognition. The global stream captures high-level features and relationships across the scene. The local stream employs a fully convolutional network to extract fine-grained local information. The proposed dual-stream architecture effectively distinguishes scenes that share similar global contexts but contain different localized objects. We evaluate the performance of the proposed framework on a publicly available benchmark indoor scene dataset. From the experimental results, we demonstrate the effectiveness of the proposed framework.

  • Research Article
  • Cite Count Icon 27
  • 10.1016/j.patcog.2018.04.017
Understanding of indoor scenes based on projection of spatial rectangles
  • Apr 21, 2018
  • Pattern Recognition
  • Hui Wei + 1 more

Understanding of indoor scenes based on projection of spatial rectangles

  • Conference Article
  • Cite Count Icon 6
  • 10.1109/fskd.2017.8393385
Indoor scene recognition based on deep learning and sparse representation
  • Jul 1, 2017
  • Ning Sun + 3 more

Indoor scene images have the characteristics of small inter-class variety and large intra-class variety because of the content complexity of indoor scene images and the influence of illumination change and partial occlusion. This makes it difficult to effectively represent the semantic information of the indoor scene using traditional shallow feature learning. We present a comprehensive method combining deep features and sparse representation for indoor scene recognition in this paper. In terms of feature extraction, a Faster R-CNN based multi-class detector is training for extracting object information to be as the low-level features. An improved bag-of-words model is designed to build mid-level features from object-based low-level features, which retain the spatial information of low-level features. For improving the robustness of the proposed method, sparse representation is used to make the final decision of indoor scene recognition from mid-level features. Experimental results on indoor scene subset of MIT-67 dataset show that our proposed method can achieve a superior performance in comparison to baseline methods.

  • Research Article
  • Cite Count Icon 2
  • 10.3390/math12223513
Deep Learning-Driven Virtual Furniture Replacement Using GANs and Spatial Transformer Networks
  • Nov 11, 2024
  • Mathematics
  • Resmy Vijaykumar + 4 more

This study proposes a Generative Adversarial Network (GAN)-based method for virtual furniture replacement within indoor scenes. The proposed method addresses the challenge of accurately positioning new furniture in an indoor space by combining image reconstruction with geometric matching through combining spatial transformer networks and GANs. The system leverages deep learning architectures like Mask R-CNN for executing image segmentation and generating masks, and it employs DeepLabv3+, EdgeConnect algorithms, and ST-GAN networks for carrying out virtual furniture replacement. With the proposed system, furniture shoppers can obtain a virtual shopping experience, providing an easier way to understand the aesthetic effects of furniture rearrangement without putting in effort to physically move furniture. The proposed system has practical applications in the furnishing industry and interior design practices, providing a cost-effective and efficient alternative to physical furniture replacement. The results indicate that the proposed method achieves accurate positioning of new furniture in indoor scenes with minimal distortion or displacement. The proposed system is limited to 2D front-view images of furniture and indoor scenes. Future work would involve synthesizing 3D scenes and expanding the system to replace furniture images photographed from different angles. This would enhance the efficiency and practicality of the proposed system for virtual furniture replacement in indoor scenes.

  • Research Article
  • Cite Count Icon 10
  • 10.1109/jsen.2020.3024702
Complete and Accurate Indoor Scene Capturing and Reconstruction Using a Drone and a Robot
  • Sep 23, 2020
  • IEEE Sensors Journal
  • Xiang Gao + 5 more

Completeness and accuracy are two important factors in image-based indoor scene 3D reconstruction. Thus, an efficient image capturing scheme that could completely cover the scene, and a robust reconstruction method that could accurately reconstruct the scene are required. To this end, in this article we propose a new pipeline for indoor scene capturing and reconstruction using a mini drone and a ground robot, which takes both capturing completeness and reconstruction accuracy into consideration. First, we use a mini drone to capture aerial video of the indoor scene, from which a 3D aerial map is reconstructed. Then, the robot moving path is planned and a set of ground-view reference images are synthesized from the aerial map. After that, the robot enters the scene and captures ground video autonomously while using the reference images to locate its position during the movement. Finally, the ground and aerial images, which are adaptively extracted from the captured videos, are merged to reconstruct a complete and accurate indoor scene model. Experimental results on two indoor scenes demonstrate the effectiveness and robustness of our proposed indoor scene capturing and reconstruction pipeline.

  • Research Article
  • Cite Count Icon 7
  • 10.1111/cgf.14166
Interactive Design and Preview of Colored Snapshots of Indoor Scenes
  • Oct 1, 2020
  • Computer Graphics Forum
  • Qiang Fu + 3 more

This paper presents an interactive system for quickly designing and previewing colored snapshots of indoor scenes. Different from high‐quality 3D indoor scene rendering, which often takes several minutes to render a moderately complicated scene under a specific color theme with high‐performance computing devices, our system aims at improving the effectiveness of color theme design of indoor scenes and employs an image colorization approach to efficiently obtain high‐resolution snapshots with editable colors. Given several pre‐rendered, multi‐layer, gray images of the same indoor scene snapshot, our system is designed to colorize and merge them into a single colored snapshot. Our system also assists users in assigning colors to certain objects/components and infers more harmonious colors for the unassigned objects based on pre‐collected priors to guide the colorization. The quickly generated snapshots of indoor scenes provide previews of interior design schemes with different color themes, making it easy to determine the personalized design of indoor scenes. To demonstrate the usability and effectiveness of this system, we present a series of experimental results on indoor scenes of different types, and compare our method with a state‐of‐the‐art method for indoor scene material and color suggestion and offline/online rendering software packages.

  • Research Article
  • Cite Count Icon 2
  • 10.1109/tpami.2024.3414441
DebSDF: Delving Into the Details and Bias of Neural Indoor Scene Reconstruction.
  • Dec 1, 2024
  • IEEE transactions on pattern analysis and machine intelligence
  • Yuting Xiao + 3 more

In recent years, the neural implicit surface has emerged as a powerful representation for multi-view surface reconstruction due to its simplicity and State-of-the-Art performance. However, reconstructing smooth and detailed surfaces in indoor scenes from multi-view images presents unique challenges. Indoor scenes typically contain large texture-less regions, making the photometric loss unreliable for optimizing the implicit surface. Previous work utilizes monocular geometry priors to improve the reconstruction in indoor scenes. However, monocular priors often contain substantial errors in thin structure regions due to domain gaps and the inherent inconsistencies when derived independently from different views. This paper presents DebSDF to address these challenges, focusing on the utilization of uncertainty in monocular priors and the bias in SDF-based volume rendering. We propose an uncertainty modeling technique that associates larger uncertainties with larger errors in the monocular priors. High-uncertainty priors are then excluded from optimization to prevent bias. This uncertainty measure also informs an importance-guided ray sampling and adaptive smoothness regularization, enhancing the learning of fine structures. We further introduce a bias-aware signed distance function to density transformation that takes into account the curvature and the angle between the view direction and the SDF normals to reconstruct fine details better. Our approach has been validated through extensive experiments on several challenging datasets, demonstrating improved qualitative and quantitative results in reconstructing thin structures in indoor scenes, thereby outperforming previous work.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 6
  • 10.1145/3632947
Haisor : Human-aware Indoor Scene Optimization via Deep Reinforcement Learning
  • Jan 3, 2024
  • ACM Transactions on Graphics
  • Jia-Mu Sun + 5 more

3D scene synthesis facilitates and benefits many real-world applications. Most scene generators focus on making indoor scenes plausible via learning from training data and leveraging extra constraints such as adjacency and symmetry. Although the generated 3D scenes are mostly plausible with visually realistic layouts, they can be functionally unsuitable for human users to navigate and interact with furniture. Our key observation is that human activity plays a critical role and sufficient free space is essential for human-scene interactions. This is exactly where many existing synthesized scenes fail—the seemingly correct layouts are often not fit for living. To tackle this, we present a human-aware optimization framework Haisor for 3D indoor scene arrangement via reinforcement learning, which aims to find an action sequence to optimize the indoor scene layout automatically. Based on the hierarchical scene graph representation, an optimal action sequence is predicted and performed via Deep Q-Learning with Monte Carlo Tree Search (MCTS), where MCTS is our key feature to search for the optimal solution in long-term sequences and large action space. Multiple human-aware rewards are designed as our core criteria of human-scene interaction, aiming to identify the next smart action by leveraging powerful reinforcement learning. Our framework is optimized end-to-end by giving the indoor scenes with part-level furniture layout including part mobility information. Furthermore, our methodology is extensible and allows utilizing different reward designs to achieve personalized indoor scene synthesis. Extensive experiments demonstrate that our approach optimizes the layout of 3D indoor scenes in a human-aware manner, which is more realistic and plausible than original state-of-the-art generator results, and our approach produces superior smart actions, outperforming alternative baselines.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.