1State Key Lab of CAD & CG, Zhejiang University
2Google
3The Chinese University of Hong Kong
Implicit neural rendering techniques have shown promising results for novel view synthesis. However, existing methods usually encode the entire scene as a whole, which is generally not aware of the object identity and limits the ability to the high-level editing tasks such as moving or adding furniture. In this paper, we present a novel neural scene rendering system, which learns an object-compositional neural radiance field and produces realistic rendering with editing capability for a clustered and real-world scene. Specifically, we design a novel two-pathway architecture, in which the scene branch encodes the scene geometry and appearance, and the object branch encodes each standalone object conditioned on learnable object activation codes. To survive the training in heavily cluttered scenes, we propose a scene-guided training strategy to solve the 3D space ambiguity in the occluded regions and learn sharp boundaries for each object. Extensive experiments demonstrate that our system not only achieves competitive performance for static scene novel-view synthesis, but also produces realistic rendering for object-level editing.
We design a two-pathway architecture for object-compositional neural radiance field. The scene branch renders the entire view of the scene, and also render the background for editable scene rendering. The object branch renders each standalone object conditioned on the object activation code.
To obtain a view with object manipulation, we jointly render the transformed objects from the conditioned object branch and the surrounding background from the scene branch.
* Dai P, Zhang Y, Li Z, et al. Neural Point Cloud Rendering via Multi-Plane Projection[C]//in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 7830-7839.
We design a two-pathway architecture for object-compositional neural radiance field. The scene branch takes the spatial coordinate $\mathbf{x}$, the interpolated scene voxel features $\boldsymbol{f}_{scn}$ at $\mathbf{x}$ and the ray direction $\mathbf{d}$ as input, and output the color $\mathbf{c}_{scn}$ and opacity $\sigma_{scn}$ of the scene. The object branch takes additional object voxel features $\boldsymbol{f}_{obj}$ as well a a object activation code $\boldsymbol{l}_{obj}$ to condition the output only contains the color $\mathbf{c}_{obj}$ and opacity $\sigma_{obj}$ for a specific object at its original location with everything else removed.
@inproceedings{yang2021objectnerf,
title={Learning Object-Compositional Neural Radiance Field for Editable Scene Rendering},
author={Yang, Bangbang and Zhang, Yinda and Xu, Yinghao and Li, Yijin and Zhou, Han and Bao, Hujun and Zhang, Guofeng and Cui, Zhaopeng},
booktitle = {International Conference on Computer Vision ({ICCV})},
month = {October},
year = {2021},
}
We thank Hanqing Jiang, Liyang Zhou and Jiaming Sun for their kind help in scene reconstruction and annotation for the ToyDesk dataset.