FREQ-MIP-AA: Frequency Mip Representation for Anti-Aliasing Neural Radiance Fields
Neural Radiance Fields (NeRF) have shown remarkable success in representing 3D scenes and generating novel views. However, they often struggle with aliasing artifacts, especially when rendering images from different camera distances from the training views. To address the issue, Mip-NeRF proposed using volumetric frustums to render a pixel and suggested integrated positional encoding (IPE). While effective, this approach requires long training times due to its reliance on MLP architecture. In this work, we propose a novel antialiasing technique that utilizes grid-based representations, usually showing significantly faster training time. In addition, we exploit frequency-domain representation to handle the aliasing problem inspired by the sampling theorem. The proposed method, FreqMipAA, utilizes scale-specific lowpass filtering (LPF) and learnable frequency masks. Scalespecific low-pass filters (LPF) prevent aliasing and prioritize important image details, and learnable masks effectively remove problematic high-frequency elements while retaining essential information. By employing a scale-specific LPF and trainable masks, FreqMipAA can effectively eliminate the aliasing factor while retaining important details. We validated the proposed technique by incorporating it into a widely used grid-based method. The experimental results have shown that the FreqMipAA effectively resolved the aliasing issues and achieved state-of-the-art results in the multi-scale Blender dataset. Our code is available at https://github.com/yi0109/FreqMipAA.
- Research Article
173
- 10.1145/3592426
- Jul 26, 2023
- ACM Transactions on Graphics
Neural radiance fields enable state-of-the-art photorealistic view synthesis. However, existing radiance field representations are either too compute-intensive for real-time rendering or require too much memory to scale to large scenes. We present a Memory-Efficient Radiance Field (MERF) representation that achieves real-time rendering of large-scale scenes in a browser. MERF reduces the memory consumption of prior sparse volumetric radiance fields using a combination of a sparse feature grid and high-resolution 2D feature planes. To support large-scale unbounded scenes, we introduce a novel contraction function that maps scene coordinates into a bounded volume while still allowing for efficient ray-box intersection. We design a lossless procedure for baking the parameterization used during training into a model that achieves real-time rendering while still preserving the photorealistic view synthesis quality of a volumetric radiance field.
- Research Article
3
- 10.1109/tvcg.2024.3456184
- Nov 1, 2024
- IEEE transactions on visualization and computer graphics
Neural radiance fields (NeRF) has achieved revolutionary breakthrough in the novel view synthesis task for complex 3D scenes. However, this new paradigm struggles to meet the requirements for real-time rendering and high perceptual quality in virtual reality. In this paper, we propose VPRF, a novel visual perceptual based radiance fields representation method, which for the first time integrates the visual acuity and contrast sensitivity models of human visual system (HVS) into the radiance field rendering framework. Initially, we encode both the appearance and visual sensitivity information of the scene into our radiance field representation. Then, we propose a visual perceptual sampling strategy, allocating computational resources according to the HVS sensitivity of different regions. Finally, we propose a sampling weight-constrained training scheme to ensure the effectiveness of our sampling strategy and improve the representation of the radiance field based on the scene content. Experimental results demonstrate that our method renders more efficiently, with higher PSNR and SSIM in the foveal and salient regions compared to the state-of-the-art FoV-NeRF. The results of the user study confirm that our rendering results exhibit high-fidelity visual perception.
- Research Article
1
- 10.1609/aaai.v38i3.27972
- Mar 24, 2024
- Proceedings of the AAAI Conference on Artificial Intelligence
Traditional Radiance Field (RF) representations capture details of a specific scene and must be trained afresh on each scene. Semantic feature fields have been added to RFs to facilitate several segmentation tasks. Generalised RF representations learn the principles of view interpolation. A generalised RF can render new views of an unknown and untrained scene, given a few views. We present a way to distil feature fields into the generalised GNT representation. Our GSN representation generates new views of unseen scenes on the fly along with consistent, per-pixel semantic features. This enables multi-view segmentation of arbitrary new scenes. We show different semantic features being distilled into generalised RFs. Our multi-view segmentation results are on par with methods that use traditional RFs. GSN closes the gap between standard and generalisable RF methods significantly. Project Page: https://vinayak-vg.github.io/GSN/
- Conference Article
66
- 10.1109/cvpr46437.2021.01294
- Jun 1, 2021
We present STaR, a novel method that performs Self-supervised Tracking and Reconstruction of dynamic scenes with rigid motion from multi-view RGB videos without any manual annotation. Recent work has shown that neural networks are surprisingly effective at the task of compressing many views of a scene into a learned function which maps from a viewing ray to an observed radiance value via volume rendering. Unfortunately, these methods lose all their predictive power once any object in the scene has moved. In this work, we explicitly model rigid motion of objects in the context of neural representations of radiance fields. We show that without any additional human specified supervision, we can reconstruct a dynamic scene with a single rigid object in motion by simultaneously decomposing it into its two constituent parts and encoding each with its own neural representation. We achieve this by jointly optimizing the parameters of two neural radiance fields and a set of rigid poses which align the two fields at each frame. On both synthetic and real world datasets, we demonstrate that our method can render photorealistic novel views, where novelty is measured on both spatial and temporal axes. Our factored representation furthermore enables animation of unseen object motion.
- Conference Article
101
- 10.1109/cvpr52688.2022.01577
- Jun 1, 2022
Coordinate-based networks have emerged as a powerful tool for 3D representation and scene reconstruction. These networks are trained to map continuous input coordinates to the value of a signal at each point. Still, current architectures are black boxes: their spectral characteristics cannot be easily analyzed, and their behavior at unsupervised points is difficult to predict. Moreover, these networks are typically trained to represent a signal at a single scale, so naive downsampling or upsampling results in artifacts. We introduce band-limited coordinate networks (BACON), a network architecture with an analytical Fourier spectrum. Bacon has constrained behavior at unsupervised points, can be designed based on the spectral characteristics of the represented signal, and can represent signals at multiple scales without per-scale supervision. We demonstrate Bacon for multiscale neural representation of images, radiance fields, and 3D scenes using signed distance functions and show that it outperforms conventional single-scale coordinate networks in terms of interpretability and quality.
- Research Article
10
- 10.1109/tvcg.2024.3429416
- Sep 1, 2025
- IEEE transactions on visualization and computer graphics
Foveated rendering provides an idea for improving the image synthesis performance of neural radiance fields (NeRF) methods. In this article, we propose a scene-aware foveated neural radiance fields method to synthesize high-quality foveated images in complex VR scenes at high frame rates. First, we construct a multi-ellipsoidal neural representation to enhance the neural radiance field's representation capability in salient regions of complex VR scenes based on the scene content. Then, we introduce a uniform sampling based foveated neural radiance field framework to improve the foveated image synthesis performance with one-pass color inference, and improve the synthesis quality by leveraging the foveated scene-aware objective function. Our method synthesizes high-quality binocular foveated images at the average frame rate of 66 frames per second ($FPS$FPS) in complex scenes with high occlusion, intricate textures, and sophisticated geometries. Compared with the state-of-the-art foveated NeRF method, our method achieves significantly higher synthesis quality in both the foveal and peripheral regions with 1.41-1.46× speedup. We also conduct a user study to prove that the perceived quality of our method has a high visual similarity with the ground truth.
- Research Article
67
- 10.1145/3592455
- Jul 26, 2023
- ACM Transactions on Graphics
We focus on reconstructing high-fidelity radiance fields of human heads, capturing their animations over time, and synthesizing re-renderings from novel viewpoints at arbitrary time steps. To this end, we propose a new multi-view capture setup composed of 16 calibrated machine vision cameras that record time-synchronized images at 7.1 MP resolution and 73 frames per second. With our setup, we collect a new dataset of over 4700 high-resolution, high-framerate sequences of more than 220 human heads, from which we introduce a new human head reconstruction benchmark 1 . The recorded sequences cover a wide range of facial dynamics, including head motions, natural expressions, emotions, and spoken language. In order to reconstruct high-fidelity human heads, we propose Dynamic Neural Radiance Fields using Hash Ensembles (NeRSemble). We represent scene dynamics by combining a deformation field and an ensemble of 3D multi-resolution hash encodings. The deformation field allows for precise modeling of simple scene movements, while the ensemble of hash encodings helps to represent complex dynamics. As a result, we obtain radiance field representations of human heads that capture motion over time and facilitate re-rendering of arbitrary novel viewpoints. In a series of experiments, we explore the design choices of our method and demonstrate that our approach outperforms state-of-the-art dynamic radiance field approaches by a significant margin.
- Conference Article
46
- 10.1109/cvpr52729.2023.01981
- Jun 1, 2023
Neural radiance fields (NeRF) have demonstrated the potential of coordinate-based neural representation (neural fields or implicit neural representation) in neural rendering. However, using a multi-layer perceptron (MLP) to represent a 3D scene or object requires enormous computational resources and time. There have been recent studies on how to reduce these computational inefficiencies by using additional data structures, such as grids or trees. Despite the promising performance, the explicit data structure necessitates a substantial amount of memory. In this work, we present a method to reduce the size without compromising the advantages of having additional data structures. In detail, we propose using the wavelet transform on grid-based neural fields. Grid-based neural fields are for fast convergence, and the wavelet transform, whose efficiency has been demonstrated in high-performance standard codecs, is to improve the parameter efficiency of grids. Furthermore, in order to achieve a higher sparsity of grid coefficients while maintaining reconstruction quality, we present a novel trainable masking approach. Experimental results demonstrate that non-spatial grid coefficients, such as wavelet coefficients, are capable of attaining a higher level of sparsity than spatial grid coefficients, resulting in a more compact representation. With our proposed mask and compression pipeline, we achieved state-of-the-art performance within a memory budget of 2 MB. Our code is available at https://github.com/daniel03c1/masked wavelet nerf.
- Research Article
7
- 10.1007/s11263-023-01936-1
- Nov 11, 2023
- International Journal of Computer Vision
We introduce an improved solution to the neural image-based rendering problem in computer vision. Given a set of images taken from a freely moving camera at train time, the proposed approach could synthesize a realistic image of the scene from a novel viewpoint at test time. The key ideas presented in this paper are (i) Recovering accurate camera parameters via a robust pipeline from unposed day-to-day images is equally crucial in neural novel view synthesis problem; (ii) It is rather more practical to model object’s content at different resolutions since dramatic camera motion is highly likely in day-to-day unposed images. To incorporate the key ideas, we leverage the fundamentals of scene rigidity, multi-scale neural scene representation, and single-image depth prediction. Concretely, the proposed approach makes the camera parameters as learnable in a neural fields-based modeling framework. By assuming per view depth prediction is given up to scale, we constrain the relative pose between successive frames. From the relative poses, absolute camera pose estimation is modeled via a graph-neural network-based multiple motion averaging within the multi-scale neural-fields network, leading to a single loss function. Optimizing the introduced loss function provides camera intrinsic, extrinsic, and image rendering from unposed images. We demonstrate, with examples, that for a unified framework to accurately model multiscale neural scene representation from day-to-day acquired unposed multi-view images, it is equally essential to have precise camera-pose estimates within the scene representation framework. Without considering robustness measures in the camera pose estimation pipeline, modeling for multi-scale aliasing artifacts can be counterproductive. We present extensive experiments on several benchmark datasets to demonstrate the suitability of our approach.
- Research Article
64
- 10.1145/3592460
- Jul 26, 2023
- ACM Transactions on Graphics
We present a one-shot method to infer and render a photorealistic 3D representation from a single unposed image (e.g., face portrait) in real-time. Given a single RGB input, our image encoder directly predicts a canonical triplane representation of a neural radiance field for 3D-aware novel view synthesis via volume rendering. Our method is fast (24 fps) on consumer hardware, and produces higher quality results than strong GAN-inversion baselines that require test-time optimization. To train our triplane encoder pipeline, we use only synthetic data, showing how to distill the knowledge from a pretrained 3D GAN into a feedforward encoder. Technical contributions include a Vision Transformer-based triplane encoder, a camera data augmentation strategy, and a well-designed loss function for synthetic data training. We benchmark against the state-of-the-art methods, demonstrating significant improvements in robustness and image quality in challenging real-world settings. We showcase our results on portraits of faces (FFHQ) and cats (AFHQ), but our algorithm can also be applied in the future to other categories with a 3D-aware image generator.
- Research Article
32
- 10.1016/j.jqsrt.2012.12.009
- Jan 9, 2013
- Journal of Quantitative Spectroscopy and Radiative Transfer
A multi-dimensional vector spherical harmonics discrete ordinate method for atmospheric radiative transfer
- Research Article
- 10.1016/0034-4257(77)90001-3
- Jan 1, 1977
- Remote Sensing of Environment
Determination of the optimum field of view and spacing between observations for a satellite — borne scanning radiometer observing the stratosphere
- Research Article
- 10.1007/s00138-025-01742-4
- Sep 23, 2025
- Machine Vision and Applications
Editing implicit and explicit representations of radiance fields: a survey
- Conference Article
696
- 10.1109/iccv48922.2021.01386
- Oct 1, 2021
We present MVSNeRF, a novel neural rendering approach that can efficiently reconstruct neural radiance fields for view synthesis. Unlike prior works on neural radiance fields that consider per-scene optimization on densely captured images, we propose a generic deep neural network that can reconstruct radiance fields from only three nearby input views via fast network inference. Our approach leverages plane-swept cost volumes (widely used in multi-view stereo) for geometry-aware scene reasoning, and combines this with physically based volume rendering for neural radiance field reconstruction. We train our network on real objects in the DTU dataset, and test it on three different datasets to evaluate its effectiveness and generalizability. Our approach can generalize across scenes (even indoor scenes, completely different from our training scenes of objects) and generate realistic view synthesis results using only three input images, significantly outperforming concurrent works on generalizable radiance field reconstruction. Moreover, if dense images are captured, our estimated radiance field representation can be easily fine-tuned; this leads to fast per-scene reconstruction with higher rendering quality and substantially less optimization time than NeRF.
- Conference Article
20
- 10.1145/3591106.3592276
- Jun 12, 2023
Neural Radiance Field (NeRF) shows dramatic results in synthesising novel views. However, existing controllable and editable NeRF methods are still incapable of both fine-grained editing and cross-scene compositing, greatly limiting their creative editing as well as potential applications. When the radiance field is fine-grained edited and composited, a severe drawback is that varying the orientation of the corresponding explicit scaffold, such as point, mesh, volume, etc., may lead to the degradation of rendering quality. In this work, by taking the respective strengths of the implicit NeRF-based representation and the explicit point-based representation, we present a novel Rotation-Invariant Point-based NeRF (RIP-NeRF) for both fine-grained editing and cross-scene compositing of the radiance field. Specifically, we introduce a novel point-based radiance field representation to replace the Cartesian coordinate as the network input. This rotation-invariant representation is met by carefully designing a Neural Inverse Distance Weighting Interpolation (NIDWI) module to aggregate neural points, significantly improving the rendering quality for fine-grained editing. To achieve cross-scene compositing, we disentangle the rendering module and the neural point-based representation in NeRF. After simply manipulating the corresponding neural points, a cross-scene neural rendering module is applied to achieve controllable cross-scene compositing without retraining. The advantages of our RIP-NeRF on editing quality and capability are demonstrated by extensive editing and compositing experiments on room-scale real scenes and synthetic objects with complex geometry.