1State Key Lab of CAD & CG, Zhejiang University
2Google
We, as human beings, can understand and picture a familiar scene from arbitrary viewpoints given a single image, whereas this is still a grand challenge for computers. We hereby present a novel solution to mimic such human perception capability based on a new paradigm of amodal 3D scene understanding with neural rendering for a closed scene. Specifically, we first learn the prior knowledge of the objects in a closed scene via an offline stage, which facilitates an online stage to understand the room with unseen furniture arrangement. During the online stage, given a panoramic image of the scene in different layouts, we utilize a holistic neural-rendering-based optimization framework to efficiently estimate the correct 3D scene layout and deliver realistic free-viewpoint rendering. In order to handle the domain gap between the offline and online stage, our method exploits compositional neural rendering techniques for data augmentation in the offline training. The experiments on both synthetic and real datasets demonstrate that our two-stage design achieves robust 3D scene understanding and outperforms competing methods by a large margin, and we also show that our realistic free-viewpoint rendering enables various applications, including scene touring and editing.
We focus on a scenario where a service robot operates in a specific indoor environment (e.g., household, office, or museum). Therefore, it can collect information of the closed scene in an offline stage, then provide effective amodal scene understanding with a single panoramic capture of the room, which facilitates high-level tasks and delivers immersive synchronized free-viewpoint touring with illumination variation and scene editing.
Given a single panoramic image of a real-world closed room, our method optimizes scene lighting and object arrangement, and then enables the capability of free-viewpoint scene touring.
Thanks to lighting augmentation, our model can adapt to different lighting conditions with only one-time data collection. Then, we can change the rendering between cold and warm lighting by swapping the lighting parameters.
We can conduct scene editing by inserting virtual object into the real-world and tour in the scene. Note that the piano and the stool are reconstructed from the iG-Synthetic dataset, which have been seamlessly inserted into the Fresh-Room scene.
First, we capture one panoramic image of a synthetic room, and fit the scene arrangement and lighting condition with our optimization. Then, we can conduct free-view scene touring, where the object placement and lighting condition are consistent with the input image.
@article{yang2022_nr_in_a_room,
title={Neural Rendering in a Room: Amodal 3D Understanding and Free-Viewpoint Rendering for the Closed Scene Composed of Pre-Captured Objects},
author={Yang, Bangbang and Zhang, Yinda and Li, Yijin and Cui, Zhaopeng and Fanello, Sean and Bao, Hujun and Zhang, Guofeng},
journal = {ACM Trans. Graph.},
issue_date = {July 2022},
volume = {41},
number = {4},
month = jul,
year = {2022},
pages = {101:1--101:10},
articleno = {101},
numpages = {10},
url = {https://doi.org/10.1145/3528223.3530163},
doi = {10.1145/3528223.3530163},
publisher = {ACM},
address = {New York, NY, USA}
}