Learning Object-Compositional Neural Radiance Field for Editable Scene Rendering

Learning Object-Compositional Neural Radiance Field
for Editable Scene Rendering

ICCV 2021

Bangbang Yang¹, Yinda Zhang², Yinghao Xu³, Yijin Li¹, Han Zhou¹, Hujun Bao¹, Guofeng Zhang¹, Zhaopeng Cui¹

¹State Key Lab of CAD & CG, Zhejiang University ²Google ³The Chinese University of Hong Kong

Abstract

Implicit neural rendering techniques have shown promising results for novel view synthesis. However, existing methods usually encode the entire scene as a whole, which is generally not aware of the object identity and limits the ability to the high-level editing tasks such as moving or adding furniture. In this paper, we present a novel neural scene rendering system, which learns an object-compositional neural radiance field and produces realistic rendering with editing capability for a clustered and real-world scene. Specifically, we design a novel two-pathway architecture, in which the scene branch encodes the scene geometry and appearance, and the object branch encodes each standalone object conditioned on learnable object activation codes. To survive the training in heavily cluttered scenes, we propose a scene-guided training strategy to solve the 3D space ambiguity in the occluded regions and learn sharp boundaries for each object. Extensive experiments demonstrate that our system not only achieves competitive performance for static scene novel-view synthesis, but also produces realistic rendering for object-level editing.

Scene Branch and Object Branch

We design a two-pathway architecture for object-compositional neural radiance field. The scene branch renders the entire view of the scene, and also render the background for editable scene rendering. The object branch renders each standalone object conditioned on the object activation code.

Animation Pipeline of Editable Scene Rendering

To obtain a view with object manipulation, we jointly render the transformed objects from the conditioned object branch and the surrounding background from the scene branch.

Comparison of Scene Editing

* Dai P, Zhang Y, Li Z, et al. Neural Point Cloud Rendering via Multi-Plane Projection[C]//in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 7830-7839.

Framework Overview

We design a two-pathway architecture for object-compositional neural radiance field. The scene branch takes the spatial coordinate $\mathbf{x}$, the interpolated scene voxel features $\boldsymbol{f}_{scn}$ at $\mathbf{x}$ and the ray direction $\mathbf{d}$ as input, and output the color $\mathbf{c}_{scn}$ and opacity $\sigma_{scn}$ of the scene. The object branch takes additional object voxel features $\boldsymbol{f}_{obj}$ as well a a object activation code $\boldsymbol{l}_{obj}$ to condition the output only contains the color $\mathbf{c}_{obj}$ and opacity $\sigma_{obj}$ for a specific object at its original location with everything else removed.