GaussianZoom

Progressive Zoom-in Generative 3D Gaussian Splatting with Geometric and Semantic Guidance

CVPR 2026

State Key Laboratory of CAD & CG, Zhejiang University
‡ denotes Project Lead, † denotes Corresponding Author
GaussianZoom teaser.

GaussianZoom progressively magnifies 3D scenes from low-resolution inputs, reconstructing them into multi-view consistent and detail-rich representations. The expandable continuous Level-of-Detail hierarchy organizes primitives across scales, enabling smooth and alias-free rendering throughout the zoom-in process.

Abstract

We introduce GaussianZoom, a generative zoom-in 3D reconstruction system with an iterative progressive framework that combines geometry-consistent scene modeling and multi-scale semantic reasoning to enable high-fidelity extreme zoom-in rendering from low-resolution inputs. To achieve this, we develop a novel multi-view consistent super-resolution module with depth-based feature warping and VLM-driven detail synthesis, ensuring accurate multi-view correspondence while enriching fine-scale appearance beyond the observed resolution. To support zooming across large magnification ranges, we further introduce a new expandable continuous Level-of-Detail hierarchy that dynamically modulates Gaussian visibility for smooth, alias-free cross-scale rendering. Experiments on Mip-NeRF360 and Tanks&Temples demonstrate that GaussianZoom achieves superior perceptual quality, multi-view consistency, and robustness under extreme magnification, establishing a strong baseline for generative zoom-in 3D scene reconstruction.

Method Overview

GaussianZoom Pipeline.

Our framework jointly leverages geometry-aware alignment, semantic priors, and a continuous Level-of-Detail (LoD) representation to perform generative zoom-in reconstruction. Starting from a coarse 3D Gaussian Splatting model, we derive per-view depth maps that enable depth-based feature warping, providing accurate multi-view correspondence. In parallel, coarse and zoomed-in renderings are processed by a vision-language model to infer semantic cues describing fine-scale appearance. These geometry-aligned features and semantic descriptions together condition the super-resolution network, synthesizing high-resolution zoomed views with plausible, view-consistent details. The resulting images are used to update a continuous LoD hierarchy, where opacity of each primitive is dynamically adjusted to enable alias-free rendering and smooth transitions across zoom levels.


BibTeX


@inproceedings{shi2026gaussianzoom,
  title     = {GaussianZoom: Progressive Zoom-in Generative 3D Gaussian Splatting with Geometric and Semantic Guidance},
  author    = {Shi, Jiale and Hu, Jiarui and Yang, Zesong and Luan, Kaixuan and Bao, Hujun and Cui, Zhaopeng},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2026}
}