Relightable and Animatable Neural Avatar from Sparse-View Video

Relightable and Animatable Neural Avatar from Sparse-View Video
CVPR 2024 (Highlight)

¹ Zhejiang University
² Stanford University
³ University of Illinois Urbana-Champaign

Reconstructing relightable and animatable neural avatar from sparse-view (or monocular) video.

Abstract

This paper tackles the challenge of creating relightable and animatable neural avatars from sparse-view (or even monocular) videos of dynamic humans under unknown illumination. Compared to studio environments, this setting is more practical and accessible but poses an extremely challenging ill-posed problem. Previous neural human reconstruction methods are able to reconstruct animatable avatars from sparse views using deformed Signed Distance Fields (SDF) but cannot recover material parameters for relighting. While differentiable inverse rendering-based methods have succeeded in material recovery of static objects, it is not straightforward to extend them to dynamic humans as it is computationally intensive to compute pixel-surface intersection and light visibility on deformed SDFs for inverse rendering. To solve this challenge, we propose a Hierarchical Distance Query (HDQ) algorithm to approximate the world space distances under arbitrary human poses. Specifically, we estimate coarse distances based on a parametric human model and compute fine distances by exploiting the local deformation invariance of SDF. Based on the HDQ algorithm, we leverage sphere tracing to efficiently estimate the surface intersection and light visibility. This allows us to develop the first system to recover animatable and relightable neural avatars from sparse view (or monocular) inputs. Experiments demonstrate that our approach is able to produce superior results compared to state-of-the-art methods. Our code will be released for reproducibility.

Introduction Video

Method

Overview of the proposed approach. Given world space camera rays, we perform sphere tracing on the hierarchically queried distances to find surface intersections and canonical correspondences. Light rays generated by an optimizable light probe are also sphere traced with HDQ to compute the closest distances along the ray for soft visibility. Material properties and surface normals are queried on the canonical correspondences and warped to world space. Then, the final pixel colors are computed using the rendering equation.

Qualitative Results

Inverse rendering of relightable properties from sparse view videos. Animated with novel poses.

Inverse rendering of relightable properties from monocular videos. Animated with novel poses.

Relighting and animation of the reconstructed avatar (from sparse videos) with several light probes.

Relighting and animation of the reconstructed avatar (from monocular videos) with several light probes.

Citation

@inproceedings{xu2024relightable,
    title={Relightable and Animatable Neural Avatar from Sparse-View Video},
    author={Xu, Zhen and Peng, Sida and Geng, Chen and Mou, Linzhan and Yan, Zihan and Sun, Jiaming and Bao, Hujun and Zhou, Xiaowei},
    booktitle={CVPR},
    year={2024}
}

The website template was borrowed from Michaël Gharbi and Jon Barron. Last updated: 08/06/2023.