EnvGS: Modeling View-Dependent Appearance with Environment Gaussian


1Zhejiang University 2Ant Group

Abstract

Reconstructing complex reflections in real-world scenes from 2D images is essential for achieving photorealistic novel view synthesis. Existing methods that utilize environment maps to model reflections from distant lighting often struggle with high-frequency reflection details and fail to account for near-field reflections. In this work, we introduce EnvGS, a novel approach that employs a set of Gaussian primitives as an explicit 3D representation for capturing reflections of environments. These environment Gaussian primitives are incorporated with base Gaussian primitives to model the appearance of the whole scene. To efficiently render these environment Gaussian primitives, we developed a ray-tracing-based renderer that leverages the GPU's RT core for fast rendering. This allows us to jointly optimize our model for high-quality reconstruction while maintaining real-time rendering speeds. Results from multiple real-world and synthetic datasets demonstrate that our method produces significantly more detailed reflections, achieving the best rendering quality in real-time novel view synthesis.

Method

Overview of EnvGS. The rendering process begins by rasterizing the base Gaussian to obtain per-pixel normals, base colors, and blending weights. Next, we render the environment Gaussian in the reflection direction using our ray-tracing-based Gaussian renderer to capture the reflection colors. Finally, we combine the reflection and base colors for the final output. We jointly optimize the environment Gaussian and base Gaussian using monocular normals and ground truth images for supervision.

Results and Comparisons

Here we demostrate side-by-side videos comparing our method to top-performing baselines across different captured scenes.

Select a scene and a baseline method below:


Interactive visualization. Hover or tap to move the split.
Notice how our method synthesizes accurate reflections of the houses and plants that smoothly move over the car's surface, while baseline methods produce fuzzy reflections that fade in and out depending on the viewpoint.
Our method renders high-quality reflections of the distant scene beyond the windows as well as near-field reflections of the nearby paintings and bowl.
Our method synthesizes accurate reflections of trees and lamppost that move smoothly across the car surface and windows, which are not directly visible in the training images, and are not captured by the baseline methods.
Our method better captures high-frequency reflection details, and is able to render convincing reflections of near-field content. Notice the accurate interreflections of the shiny spheres and statue head.
Our method renders high-quality reflections of the surrounding mall environment, including the ceiling lights and the nearby buildings, while baseline methods fail to capture these details.
Observe how our method synthesizes accurate and smooth reflections of the surrounding houses on the car's surface, as well as the near-field reflections of the parking lines on the car's side door. In contrast, baseline methods produce blurry reflections that fade in and out depending on the viewpoint and fail to accurately capture near-field reflections.
Although semi-transparent but reflective surfaces (such as windows) can be challenging for our method, it is able to simulate convincing reflections across the body and windshield of the car.
Notice the black reflective surface and the window light reflected on the tabletop. Our method is able to maintain consistent specular highlights across different viewpoints, rather than appearing and disappearing depending on the viewing angle.

Ablation Studies

Here we demostrate side-by-side videos comparing our full method to versions of our method where key components have been ablated on the scene gardenspheres. See more details in the paper.

Select an ablation below:


Interactive visualization. Hover or tap to move the split.
The "w/o joint optimization" ablation detaches the joint optimization of the base Gaussian and the environment Gaussian from the reflection rendering step, this variant fails to recover accurate geometry, leading to inferior reflection reconstruction and rendering quality. The "w/o monocular normal loss" variant removes the monocular normal constraint, training may become trapped in largely incorrect geometry, resulting in inaccurate reflection reconstruction. The "w/ environment map" variant replaces our core Gaussian environment representation with an environment map representation while keeping all other components unchanged, the result illustrates that while it effectively captures smooth distant reflections, it has difficulty modeling near-field and high-frequency reflections, and produces more bumpy geometry. The "w/o color sabotage" and "w/o normal propagation" ablations remove the color sabotage and normal propagation components, respectively, both variants result in reduced rendering quality.
Using the G-buffer instead of Sph-Mip (i.e. w/o color sabotage) or without joint optimization (i.e., w/o joint optimization), sharp details, such as tree branches reflected in the sphere, are not accurately reconstructed. It is necessary to use multi-level spherical feature grid strategies (i.e., w/ environment map), otherwise rough surfaces will fail to be reconstructed and artifacts will appear during rendering. Additionally, monocular normal loss (i.e., w/o monocular normal loss) is essential for modeling near-field inter-reflections.

Related Works

  • 3D Gaussian Splatting for Real-Time Radiance Field Rendering
  • 2D Gaussian Splatting for Geometrically Accurate Radiance Fields
  • Gaussian Ray Tracing: Fast Tracing of Particle Scenes
  • Ref-NeRF: Structured View-Dependent Appearance for Neural Radiance Fields
  • NeRF-Casting: Improved View-Dependent Appearance with Consistent Reflections
  • GaussianShader: 3D Gaussian Splatting with Shading Functions for Reflective Surfaces
  • 3D Gaussian Splatting with Deferred Reflection
  • Citation

    
    @article{xie2024envgs,
        title={EnvGS: Modeling View-Dependent Appearance with Environment Gaussian},
        author={Xie, Tao and Chen, Xi and Xu, Zhen and Xie, Yiman and Jin, Yudong and Shen, Yujun and Peng, Sida and Bao, Hujun and Zhou, Xiaowei},
        journal={arXiv preprint arXiv:2412.15215},
        year={2024}
    }