4K4D: Real-Time 4D View Synthesis at 4K Resolution

CVPR 2024


Zhen Xu1  Sida Peng1   Haotong Lin1   Guangzhao He1   Jiaming Sun1   Yujun Shen2   Hujun Bao1   Xiaowei Zhou1

1Zhejiang University   2Ant Group

Real-time rendering demo on the Mobile-Stage, ENeRF-Outdoor and DNA-Rendering dataset, with 24, 18 and 60 views respectively.

Abstract


This paper targets high-fidelity and real-time view synthesis of dynamic 3D scenes at 4K resolution. Recently, some methods on dynamic view synthesis have shown impressive rendering quality. However, their speed is still limited when rendering high-resolution images. To overcome this problem, we propose 4K4D, a 4D point cloud representation that supports hardware rasterization and enables unprecedented rendering speed. Our representation is built on a 4D feature grid so that the points are naturally regularized and can be robustly optimized. In addition, we design a novel hybrid appearance model that significantly boosts the rendering quality while preserving efficiency. Moreover, we develop a differentiable depth peeling algorithm to effectively learn the proposed model from RGB videos. Experiments show that our representation can be rendered at over 400 FPS on the DNA-Rendering dataset at 1080p resolution and 80 FPS on the ENeRF-Outdoor dataset at 4K resolution using an RTX 4090 GPU, which is 30$\times$ faster than previous methods and achieves the state-of-the-art rendering quality.

Method


(a) By applying the space-carving algorithm, we extract the initial cloud sequence $\mathbf{x}, t$ of the target scene. A 4D feature grid is predefined to assign a feature vector to each point, which is then fed into MLPs for the scene geometry and appearance. (b) The geometry model is based on the point location, radius, and density, which forms a semi-transparent point cloud. % and spherical harmonics coefficients $\mathbf{s}$. (c) The appearance model consists of a piece-wise constant IBR term $\mathbf{c}_{ibr}$ and a continuous SH model $\mathbf{c}_{sh}$. (d) The proposed representation is learned from multi-view RGB videos through the differentiable depth peeling algorithm.

Qualitative Results


More Qualitative Results on the DNA-Rendering Dataset
More Qualitative Results on the Mobile-Stage Dataset
More Qualitative Results on the ENeRF-Outdoor Dataset
More Qualitative Results on the Neural3DV Dataset
More Qualitative Results on the NHR Dataset
More Qualitative Results on the ZJU-MoCap Dataset
More Qualitative Results on the Mobile-Stage Dataset

Real-Time Demos


More Real-Time Demos on the NHR Dataset
More Real-Time Demos on the ZJU-MoCap Dataset
More Real-Time Demos on the MobileStage Dataset
More Real-Time Demos on the Neural3DV Dataset

Baseline Comparisons


Comparisions with K-Planes
Comparisions with ENeRF


Citation


@inproceedings{xu20244k4d,
  title={4K4D: Real-Time 4D View Synthesis at 4K Resolution},
  author={Xu, Zhen and Peng, Sida and Lin, Haotong and He, Guangzhao and Sun, Jiaming and Shen, Yujun and Bao, Hujun and Zhou, Xiaowei},
  booktitle={CVPR},
  year={2024}
}