RemixFusion: Residual-based Mixed Representation for Large-scale Online RGB-D Reconstruction

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

The introduction of the neural implicit representation has notably propelled the advancement of online dense reconstruction techniques. Compared to traditional explicit representations, such as TSDF, it substantially improves the mapping completeness and memory efficiency. However, the lack of reconstruction details and the time-consuming learning of neural representations hinder the widespread application of neural-based methods to large-scale online reconstruction. We introduce RemixFusion, a novel residual-based mixed representation for scene reconstruction and camera pose estimation dedicated to high-quality and large-scale online RGB-D reconstruction. In particular, we propose a residual-based map representation comprised of an explicit coarse TSDF grid and an implicit neural module that produces residuals representing fine-grained details to be added to the coarse grid. Such mixed representation allows for detail-rich reconstruction with bounded time and memory budget, contrasting with the overly-smoothed results by the purely implicit representations, thus paving the way for high-quality camera tracking. Furthermore, we extend the residual-based representation to handle multi-frame joint pose optimization via bundle adjustment (BA). In contrast to the existing methods, which optimize poses directly, we opt to optimize pose changes. Combined with a novel technique for adaptive gradient amplification, our method attains better optimization convergence and global optimality. Furthermore, we adopt a local moving volume to factorize the whole mixed scene representation with a divide-and-conquer design to facilitate efficient online learning in our residual-based framework. Extensive experiments demonstrate that our method surpasses all state-of-the-art ones, including those based either on explicit or implicit representations, in terms of the accuracy of both mapping and tracking on large-scale scenes. Project page can be found at https://lanlan96.github.io/RemixFusion/ .

Similar Papers
  • Research Article
  • Cite Count Icon 40
  • 10.1016/j.engfracmech.2016.03.027
A novel hybrid approach for level set characterization and tracking of non-planar 3D cracks in the extended finite element method
  • Mar 25, 2016
  • Engineering Fracture Mechanics
  • Alireza Sadeghirad + 4 more

A novel hybrid approach for level set characterization and tracking of non-planar 3D cracks in the extended finite element method

  • Research Article
  • Cite Count Icon 4
  • 10.1007/s00221-022-06511-7
The distorted hand: systematic but 'independent' distortions in both explicit and implicit hand representations in young female adults.
  • Nov 21, 2022
  • Experimental Brain Research
  • Lara A Coelho + 2 more

It has long been assumed that an accurate representation of the size and shape of one's body is necessary to successfully interact with the environment. Previous research has shown accurate representations when healthy participants make overt judgments (i.e. explicit) about the size of their bodies. However, when body size is judged implicitly, studies have shown systematic distortions. One suggestion for these differences, is that explicit and implicit representations are informed by different sensory modalities. Explicit representations rely on vision whereas implicit representations are informed by haptics. We designed an experiment to investigate if explicit representations that are informed by haptics are more like implicit representation featuring systematic distortions. We asked female participants to estimate the size of their fingers and hands in three different tasks: an explicit-haptic, an implicit, and an explicit-vision task. The results showed that all three representations were distorted and furthermore, the distortions for each representation were different from one another. These results suggest that inaccurate finger and hand length are a stereotypical feature of body representation that is present in both visual and haptic domains. We discuss the results in relation to theories of body representation.

  • Research Article
  • Cite Count Icon 23
  • 10.1109/lgrs.2018.2866280
Hand-Held 3-D Reconstruction of Large-Scale Scene With Kinect Sensors Based on Surfel and Video Sequences
  • Dec 1, 2018
  • IEEE Geoscience and Remote Sensing Letters
  • Haonan Xu + 2 more

This letter presents a hand-held complex large-scale scene reconstruction method with Kinect sensors based on surfel and video sequences. The feature point method simultaneous localization and mapping (SLAM) is employed to estimate the pose of the camera, and then bundle adjustment by combining 2-D and 3-D feature points is used to optimize camera pose. Also, the surfel model is employed to construct deformation maps for the fusion and optimization of point clouds, and finally, an accurate precise 3-D map can be obtained. The main contribution of this letter is that: 1) by using the SLAM method to obtain camera pose as the initial value of optimization, the problem of insufficient memory and low efficiency of the structure form motion method can be well solved; 2) sparsely textured regions can be reconstructed better by using bundle adjustment by combining 2-D and 3-D feature points; and 3) dense 3-D reconstruction of large scenes can be achieved, and the reconstructed 3-D models are more elaborate. Finally, experimental results show that this proposed method can be applied to a variety of complex large-scale scenes, and can obtain accurate precise 3-D model. This presented 3-D reconstruction method can be widely used in the fields of human–computer interaction, consumer electronics, and virtual reality.

  • Book Chapter
  • Cite Count Icon 1
  • 10.1002/0470091355.ecm013.pub2
Geometric Modeling for Engineering Applications
  • Aug 15, 2004
  • F ‐E Wolter + 2 more

A geometric model of an object—in most cases being a subset of the 3D space—can be used to better understand the object's structure or behavior. Therefore data such as the geometry, the topology and other application specific data have to be represented by the model. With the help of a computer it is possible to manipulate, process or display these data. We will discuss different approaches for representing such an object: Volume‐based representations describe the object in a direct way, whereas boundary representations describe the object indirectly by specifying its boundary. A variety of different surface patches can be used to model the object's boundary. For many applications it is sufficient to know only the boundary of an object. For special objects explicit or implicit mathematical representations can easily be given. An explicit representation is a map from a known parameter space for instance the unit cube to 3D‐space. Implicit representations are equations or relations such as the set of zeros of a functional with three unknowns. These can be very efficient in special cases. As an example of volume‐based representations we will give a brief overview of the voxel representation. We also show how the boundary of complex objects can be assembled by simpler parts such as surface patches. These come in a variety of forms: planar polygons, parametric surfaces defined by a map from 2D space to 3D space, especially spline surfaces and trimmed surfaces, multiresolutionally represented surfaces (for example wavelet‐based) and surfaces obtained by subdivision schemes. In a boundary representation only the boundary of a solid is described. This is usually done by describing the boundary as a collection of surface patches attached to each other at outer edges. One of the (topologically) most complete schemes is the half‐edge data structure as described by Mäntylä. Simple objects constructed via any of the methods above can be joined to build more complex objects via Boolean operators (constructive solid geometry, CSG). Constructing an object one has to assure that the object is in agreement with the topological requirements of the modeling system. Notoriously difficult problems are caused by the fact that most modeling systems can compute surface intersections only with a limited precision. This yields numerical results that may finally cause major errors such as topologically contradictory conclusions. The rather new method of ‘medial modeling’ is also presented. Here an object is described by its medial axis and an associated radius function. The medial axis itself is a collection of lower dimensional objects, that is, for a 3D solid a set of points, curves and surface patches. This medial modeling concept developed at the Welfenlab yields a very intuitive user interface (UI) useful for solid modeling, and also gives as a by‐product a natural meshing of the solid for FEM computations. Additional attributes can be attached to an object, like attributes of physical origin or logical attributes. Physical attributes include photometric, haptical and other material properties, such as elasticity or roughness. Physical attributes are often specified by textures. These texture are mapped to the surface to relate surface points to certain quantities of the attribute. The most common use for these are photometric textures, although they can also be used for roughness etc. Logical attributes relate the object to its (data)environment. They can for example group objects that are somehow related, or they can associate scripts to the object, such as callbacks for user interactions.

  • Book Chapter
  • Cite Count Icon 4
  • 10.1002/0470091355.ecm013
Geometric Modeling of Complex Shapes and Engineering Artifacts
  • Aug 15, 2004
  • F.‐E Wolter + 2 more

A geometric model of an object – in most cases being a subset of the 3‐D space – can be used to better understand the object's structure or behavior. Therefore data such as the geometry, the topology, and other application specific data have to be represented by the model. With the help of a computer it is possible to manipulate, process, or display these data. We will discuss different approaches for representing such an object: Volume‐based representations describe the object in a direct way, whereas boundary representations describe the object indirectly by specifying its boundary. A variety of different surface patches can be used to model the object's boundary. For many applications, it is sufficient to know only the boundary of an object. For special objects, explicit or implicit mathematical representations can easily be given. An explicit representation is a map from a known parameter space, for example, the unit cube to 3‐D space. Implicit representations are equations or relations such as the set of zeros of a functional with three unknowns. These can be very efficient in special cases. As an example of volume‐based representations, we will give a brief overview of the voxel representation. We also show how the boundary of complex objects can be assembled by simpler parts, for example, surface patches. These come in a variety of forms: planar polygons, parametric surfaces defined by a map from 2‐D space to 3‐D space, especially spline surfaces and trimmed surfaces, multiresolutionally represented surfaces, for example, wavelet‐based surfaces and surfaces obtained by subdivision schemes. In a boundary representation, only the boundary of a solid is described. This is usually done by describing the boundary as a collection of surface patches attached to each other at outer edges. One of the (topologically) most complete schemes is the half‐edge data structure as described by Mäntylä. Simple objects constructed via any of the methods above can be joined to build more complex objects via Boolean operators (constructive solid geometry, CSG). Constructing an object one has to assure that the object is in agreement with the topological requirements of the modeling system. Notoriously difficult problems are caused by the fact that most modeling systems can compute surface intersections only with a limited precision. This yields numerical results that may finally cause major errors, for example, topologically contradictory conclusions. The rather new method of ‘Medial Modeling’ is also presented. Here an object is described by its medial axis and an associated radius function. The medial axis itself is a collection of lower dimensional objects, that is, for a 3‐D solid a set of points, curves, and surface patches. This medial modeling concept developed at the Welfenlab yields a very intuitive user interface useful for solid modeling, and also gives as a by‐product a natural meshing of the solid for FEM computations. Additional attributes can be attached to an object, that is, attributes of physical origin or logical attributes. Physical attributes include photometric, haptical, and other material properties, such as elasticity or roughness. Physical attributes are often specified by textures, which are functions that relate surface points to certain quantities of the attribute. The most common use for these are photometric textures, although they can also be used for roughness and so on. Logical attributes relate the object to its (data‐)environment. They can, for example, group objects which are somehow related or they can associate scripts to the object, such as callbacks for user interactions.

  • Research Article
  • 10.6519/tjl.2004.2(1).4
The Development of Phonological Representations among Chinese-Speaking Children
  • Jun 1, 2004
  • Taiwan journal of linguistics
  • Chieh–Fang Hu

Two experiments were conducted to test the hypothesis that implicit and explicit phonological representations of Chinese-speaking children undergo restructuring with reading experience. Implicit representations refer to those that are automatically handled by the language module; explicit representations refer to those that are explicit control units in speech manipulation. Experiment Ⅰ studied the restructuring process among a group of pre-schoolers who had little knowledge of printed words when they were first tested. Experiment Ⅱ studied a group of novice readers from the beginning of the first grade until the end of the third grade. The results revealed that children's memory errors contained more errors in transposing subsyllabic units available in the stimulus strings than errors in misordering entire syllables. The proportion of transposition errors increased significantly from the pre-school age to the first grade and was related to the rapid increase in the children's ability to read Zhuyin fuhao. The developmental sequence of explicit representations parallels that of implicit representations in that pre-schoolers could not cope with the phonological awareness task at the subsyllabic level at a time before they could demonstrate any ability in reading whereas first-grade children demonstrated fundamental control over the task. Children's performances on phonological awareness were related to Zhuyin fuhao reading ability at the last testing session of the pre-school age and continued to be evident till the third grade. Though parallel in developmental sequence, there was no evidence that the development of explicit representations depended on the development of implicit representations. The notion of modularity developments.

  • Conference Article
  • Cite Count Icon 29
  • 10.1109/3dv50981.2020.00052
Semantic Implicit Neural Scene Representations With Semi-Supervised Training
  • Nov 1, 2020
  • Amit Pal Singh Kohli + 2 more

The recent success of implicit neural scene representations has presented a viable new method for how we capture and store 3D scenes. Unlike conventional 3D representations, such as point clouds, which explicitly store scene properties in discrete, localized units, these implicit representations encode a scene in the weights of a neural network which can be queried at any coordinate to produce these same scene properties. Thus far, implicit representations have primarily been optimized to estimate only the appearance and/or 3D geometry information in a scene. We take the next step and demonstrate that an existing implicit representation (SRNs) [67] is actually multi-modal; it can be further leveraged to perform per-point semantic segmentation while retaining its ability to represent appearance and geometry. To achieve this multi-modal behavior, we utilize a semi-supervised learning strategy atop the existing pre-trained scene representation. Our method is simple, general, and only requires a few tens of labeled 2D segmentation masks in order to achieve dense 3D semantic segmentation. We explore two novel applications for this semantically aware implicit neural scene representation: 3D novel view and semantic label synthesis given only a single input RGB image or 2D label mask, as well as 3D interpolation of appearance and semantics.

  • Research Article
  • Cite Count Icon 22
  • 10.1080/00313830701356034
Knowledge for Real: On implicit and explicit representations and education
  • Jul 1, 2007
  • Scandinavian Journal of Educational Research
  • Theresa S S Schilhab

Explicit knowledge is highly valued in the traditional educational system at the expense of implicit equivalents. Why is this? Are we right to so value it? To provide an answer I attempt to characterise the relation between representations and referents of implicit and explicit knowledge. Whereas explicit representations are detached from referents and so “inauthentic”, implicit representations are analogous to referents and more “authentic”. The notion of authenticity frames to what extent critical properties such as transferability to different contexts and conveyance between agents, which seem vital to class teaching, are met by implicit and explicit knowledge.

  • Research Article
  • Cite Count Icon 11
  • 10.1111/cgf.14592
Physics‐Based Inverse Rendering using Combined Implicit and Explicit Geometries
  • Jul 1, 2022
  • Computer Graphics Forum
  • G Cai + 4 more

Mathematically representing the shape of an object is a key ingredient for solving inverse rendering problems. Explicit representations like meshes are efficient to render in a differentiable fashion but have difficulties handling topology changes. Implicit representations like signed‐distance functions, on the other hand, offer better support of topology changes but are much more difficult to use for physics‐based differentiable rendering. We introduce a new physics‐based inverse rendering pipeline that uses both implicit and explicit representations. Our technique enjoys the benefit of both representations by supporting both topology changes and differentiable rendering of complex effects such as environmental illumination, soft shadows, and interreflection. We demonstrate the effectiveness of our technique using several synthetic and real examples.

  • Research Article
  • Cite Count Icon 1
  • 10.1177/13591045221113390
Explicit and implicit attachment representations in cognitively able school-age children with autism spectrum disorder: A window to their inner world.
  • Jul 6, 2022
  • Clinical Child Psychology and Psychiatry
  • Michele Giannotti + 3 more

The few studies available on quality of attachment in school-age children with autism spectrum disorder (ASD) exclusively used questionnaires assessing explicit attachment representations. Thus, in the current study we assessed both explicit and implicit attachment representations in 23 children with ASD (without intellectual disability), 22 with learning disabilities and 27 with typical development aged from 7 to 13years. A self-reported measure on the quality of attachment to parents and a semi-structured interview were administered to the children. In addition, a developmental assessment of the child including measures of intelligence and social-communication impairment was conducted. Despite the lack of group differences on explicit attachment representations, we found that children with ASD showed higher rates of at-risk self-protective strategies and psychological trauma compared to the TD group. Children with SLD also showed a high level of at-risk implicit attachment representations than TD, albeit to a lesser extent compared to children with ASD. These results may be related to several factors associated with ASD impairment and developmental pathways, such as the atypical learning process which occur at interpersonal level, the difficulties in social information processing and reflective functioning. Our findings suggested that children with ASD may experience difficulties in the construction of balanced implicit attachment representations. Thus, a more comprehensive assessment of attachment including both implicit and explicit representations is recommended.

  • Conference Article
  • 10.1117/12.970009
Sensor Fusion: Storage, Search And Problem Solving Efficiency Issues Associated With Reasoning In Context
  • Mar 1, 1990
  • Richard Antony

This paper proposes an abstract model of the data fusion process that is based on a generalization of the sensor correlation paradigm. The fusion model demonstrates that sensor fusion is a proper subset of data fusion and reveals two fundamental approaches to individual fusion processes based on explicit and implicit representations of knowledge. Most attempts to automate data fusion have been based on various forms of explicit representation, while implicit representations are more characterisitic of knowledge representations employed by human problem solvers. The efficiency of the two approaches is contrasted for two specific fusion problems: doctrinal templating and path planning. While an explicit representation approach may require the exhaustive representation of a large number of explicit templates, an implicit representation approach ( which supports reasoning in context) may require the maintenance, search and manipulation of large domain knowledge bases. A previously proposed database organization is recommended that supports the fusion process for both representation classes by facilitating highly efficient mixed semantic and spatial-oriented queries and manipulation.

  • Research Article
  • Cite Count Icon 2
  • 10.1109/access.2022.3187093
A Novel Bundle Adjustment Approach Based on Guess-Aided and Angle Quantization Multiobjective Particle Swarm Optimization (GAMOPSO) for 3D Reconstruction Applications
  • Jan 1, 2022
  • IEEE Access
  • Maher Alndiwee + 2 more

The 3D reconstruction process is very important in a variety of computer vision applications. Bundle adjustment has a significant impact on 3D reconstruction processes, namely in Simultaneously Localization and Mapping (SLAM) and Structure from Motion (SfM). Bundle adjustment which optimizes camera parameters and 3D points as a very important final stage suffers from memory and efficiency requirements in very large-scale reconstruction. Multi-objective optimization (MOO) is used in solving a variety of realistic engineering problems. Multi-Objective Particle Swarm Optimization (MOPSO) is regarded as one of the states of the art for meta-heuristic MOO. MOPSO has utilized the concept of crowding distance as a measure to differentiate between solutions in the search space and provide a high level of exploration. However, this method ignores the direction of the exploration which is not sufficient to effectively explore the search space. In addition, MOPSO starts the search from a fully randomly initialized swarm without taking any prior knowledge about the initial guess into account, which is considered impractical in applications where we can estimate initial values for solutions like bundle adjustment. In this paper, we introduced a novel hybrid MOPSO-based bundle adjustment algorithm that takes advantage of initial guess, angle quantization technique, and traditional optimization algorithms like RADAM to improve the mobility of MOPSO solutions; the results showed that our algorithm can help improve the accuracy and efficiency of bundle adjustment (BA).

  • Research Article
  • Cite Count Icon 10
  • 10.1609/aaai.v34i07.6890
Optical Flow in Deep Visual Tracking
  • Apr 3, 2020
  • Proceedings of the AAAI Conference on Artificial Intelligence
  • Mikko Vihlman + 1 more

Single-target tracking of generic objects is a difficult task since a trained tracker is given information present only in the first frame of a video. In recent years, increasingly many trackers have been based on deep neural networks that learn generic features relevant for tracking. This paper argues that deep architectures are often fit to learn implicit representations of optical flow. Optical flow is intuitively useful for tracking, but most deep trackers must learn it implicitly. This paper is among the first to study the role of optical flow in deep visual tracking. The architecture of a typical tracker is modified to reveal the presence of implicit representations of optical flow and to assess the effect of using the flow information more explicitly. The results show that the considered network learns implicitly an effective representation of optical flow. The implicit representation can be replaced by an explicit flow input without a notable effect on performance. Using the implicit and explicit representations at the same time does not improve tracking accuracy. The explicit flow input could allow constructing lighter networks for tracking.

  • Book Chapter
  • 10.1016/b978-012443880-4/50091-0
47 - The Trainer System: Applying QR Techniques to Intelligent Tutoring Systems
  • Jan 1, 2002
  • Expert Systems, Six-Volume Set
  • C Quek + 2 more

47 - The Trainer System: Applying QR Techniques to Intelligent Tutoring Systems

  • Conference Article
  • Cite Count Icon 4
  • 10.1109/ieeeconf56349.2022.10052055
Implicit Neural Representation for Mesh-Free Inverse Obstacle Scattering
  • Oct 31, 2022
  • Tin Vlašić + 3 more

Implicit representation of shapes as level sets of multilayer perceptrons has recently flourished in different shape analysis, compression, and reconstruction tasks. In this paper, we introduce an implicit neural representation-based framework for solving the inverse obstacle scattering problem in a mesh-free fashion. We express the obstacle shape as the zero-level set of a signed distance function which is implicitly determined by network parameters. To solve the direct scattering problem, we implement the implicit boundary integral method. It uses projections of the grid points in the tubular neighborhood onto the boundary to compute the PDE solution directly in the level-set framework. The proposed implicit representation conveniently handles the shape perturbation in the optimization process. To update the shape, we use PyTorch's automatic differentiation to backpropagate the loss function w.r.t. the network parameters, allowing us to avoid complex and error-prone manual derivation of the shape derivative. Additionally, we propose a deep generative model of implicit neural shape representations that can fit into the framework. The deep generative model effectively regularizes the inverse obstacle scattering problem, making it more tractable and robust, while yielding high-quality reconstruction results even in noise-corrupted setups.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.