EnvGS

Comparison of EnvGS and other baselines on complex real-world scenes with highly specular reflections.

Real-time GUI demonstration, we provide a real-time GUI for users to interact with the EnvGS model, users can check the final color, reflection color, base color, depth map and normal maps of the scene from arbitrary viewpoints in real-time.

EnvGS performs well on highly specular indoor scenes reconstruction, check the specular light reflections on the desk, we also provide decomposition results (base color and reflection color only) and normal maps visualization.

Abstract

Reconstructing complex reflections in real-world scenes from 2D images is essential for achieving photorealistic novel view synthesis. Existing methods that utilize environment maps to model reflections from distant lighting often struggle with high-frequency reflection details and fail to account for near-field reflections. In this work, we introduce EnvGS, a novel approach that employs a set of Gaussian primitives as an explicit 3D representation for capturing reflections of environments. These environment Gaussian primitives are incorporated with base Gaussian primitives to model the appearance of the whole scene. To efficiently render these environment Gaussian primitives, we developed a ray-tracing-based renderer that leverages the GPU's RT core for fast rendering. This allows us to jointly optimize our model for high-quality reconstruction while maintaining real-time rendering speeds. Results from multiple real-world and synthetic datasets demonstrate that our method produces significantly more detailed reflections, achieving the best rendering quality in real-time novel view synthesis.

Method

Overview of EnvGS. The rendering process begins by rasterizing the base Gaussian to obtain per-pixel normals, base colors, and blending weights. Next, we render the environment Gaussian in the reflection direction using our ray-tracing-based Gaussian renderer to capture the reflection colors. Finally, we combine the reflection and base colors for the final output. We jointly optimize the environment Gaussian and base Gaussian using monocular normals and ground truth images for supervision.

Results and Comparisons

Here we demostrate side-by-side videos comparing our method to top-performing baselines across different captured scenes.

Select a scene and a baseline method below:

hatchback
toaster
sedan
spheres
dog
audi
compact
grinder

Interactive visualization. Hover or tap to move the split.

3DGS vs. Ours
2DGS vs. Ours
GaussianShader vs. Ours
3DGS-DR vs. Ours

Notice how our method synthesizes accurate reflections of the houses and plants that smoothly move over the car's surface, while baseline methods produce fuzzy reflections that fade in and out depending on the viewpoint.

3DGS vs. Ours
2DGS vs. Ours
GaussianShader vs. Ours
3DGS-DR vs. Ours

Our method renders high-quality reflections of the distant scene beyond the windows as well as near-field reflections of the nearby paintings and bowl.

3DGS vs. Ours
2DGS vs. Ours
GaussianShader vs. Ours
3DGS-DR vs. Ours

Our method synthesizes accurate reflections of trees and lamppost that move smoothly across the car surface and windows, which are not directly visible in the training images, and are not captured by the baseline methods.

3DGS vs. Ours
2DGS vs. Ours
GaussianShader vs. Ours
3DGS-DR vs. Ours

Our method better captures high-frequency reflection details, and is able to render convincing reflections of near-field content. Notice the accurate interreflections of the shiny spheres and statue head.

3DGS vs. Ours
2DGS vs. Ours
GaussianShader vs. Ours
3DGS-DR vs. Ours

Our method renders high-quality reflections of the surrounding mall environment, including the ceiling lights and the nearby buildings, while baseline methods fail to capture these details.

3DGS vs. Ours
2DGS vs. Ours
GaussianShader vs. Ours
3DGS-DR vs. Ours

Observe how our method synthesizes accurate and smooth reflections of the surrounding houses on the car's surface, as well as the near-field reflections of the parking lines on the car's side door. In contrast, baseline methods produce blurry reflections that fade in and out depending on the viewpoint and fail to accurately capture near-field reflections.

3DGS vs. Ours
2DGS vs. Ours
GaussianShader vs. Ours
3DGS-DR vs. Ours

Although semi-transparent but reflective surfaces (such as windows) can be challenging for our method, it is able to simulate convincing reflections across the body and windshield of the car.

3DGS vs. Ours
2DGS vs. Ours
GaussianShader vs. Ours
3DGS-DR vs. Ours

Notice the black reflective surface and the window light reflected on the tabletop. Our method is able to maintain consistent specular highlights across different viewpoints, rather than appearing and disappearing depending on the viewing angle.

Ablation Studies

Here we demostrate side-by-side videos comparing our full method to versions of our method where key components have been ablated on the scene gardenspheres. See more details in the paper.

Select an ablation below:

Render
Normal

Interactive visualization. Hover or tap to move the split.

w/o joint optimization
w/o monocular normal loss
w/ environment map
w/o color sabotage
w/o normal propagation

The "w/o joint optimization" ablation detaches the joint optimization of the base Gaussian and the environment Gaussian from the reflection rendering step, this variant fails to recover accurate geometry, leading to inferior reflection reconstruction and rendering quality. The "w/o monocular normal loss" variant removes the monocular normal constraint, training may become trapped in largely incorrect geometry, resulting in inaccurate reflection reconstruction. The "w/ environment map" variant replaces our core Gaussian environment representation with an environment map representation while keeping all other components unchanged, the result illustrates that while it effectively captures smooth distant reflections, it has difficulty modeling near-field and high-frequency reflections, and produces more bumpy geometry. The "w/o color sabotage" and "w/o normal propagation" ablations remove the color sabotage and normal propagation components, respectively, both variants result in reduced rendering quality.

w/o joint optimization
w/o monocular normal loss
w/ environment map
w/o color sabotage
w/o normal propagation

Using the G-buffer instead of Sph-Mip (i.e. w/o color sabotage) or without joint optimization (i.e., w/o joint optimization), sharp details, such as tree branches reflected in the sphere, are not accurately reconstructed. It is necessary to use multi-level spherical feature grid strategies (i.e., w/ environment map), otherwise rough surfaces will fail to be reconstructed and artifacts will appear during rendering. Additionally, monocular normal loss (i.e., w/o monocular normal loss) is essential for modeling near-field inter-reflections.

2D Gaussian Splatting for Geometrically Accurate Radiance Fields

Gaussian Ray Tracing: Fast Tracing of Particle Scenes

Ref-NeRF: Structured View-Dependent Appearance for Neural Radiance Fields

NeRF-Casting: Improved View-Dependent Appearance with Consistent Reflections

GaussianShader: 3D Gaussian Splatting with Shading Functions for Reflective Surfaces

3D Gaussian Splatting with Deferred Reflection

Citation


@article{xie2024envgs,
    title={EnvGS: Modeling View-Dependent Appearance with Environment Gaussian},
    author={Xie, Tao and Chen, Xi and Xu, Zhen and Xie, Yiman and Jin, Yudong and Shen, Yujun and Peng, Sida and Bao, Hujun and Zhou, Xiaowei},
    journal={arXiv preprint arXiv:2412.15215},
    year={2024}
}

EnvGS: Modeling View-Dependent Appearance with Environment Gaussian

CVPR 2025

Abstract

Method

Results and Comparisons

Ablation Studies

Related Works

Citation