Animatable Implicit Neural Representations for Creating Realistic Avatars from Videos

TPAMI 2024, ICCV 2021

Sida Peng¹, Zhen Xu¹, Junting Dong¹, Qianqian Wang², Shangzhan Zhang¹, Qing Shuai¹, Hujun Bao¹, Xiaowei Zhou¹

¹State Key Lab of CAD & CG, Zhejiang University ²Cornell University

Paper

Supplementary

Code

Abstract

This paper is an extension of Animatable NeRF, which has better reconstruction and rendering results.

This paper addresses the challenge of reconstructing an animatable human model from a multi-view video. Some recent works have proposed to decompose a non-rigidly deforming scene into a canonical neural radiance field and a set of deformation fields that map observation-space points to the canonical space, thereby enabling them to learn the dynamic scene from images. However, they represent the deformation field as translational vector field or SE(3) field, which makes the optimization highly under-constrained. Moreover, these representations cannot be explicitly controlled by input motions. Instead, we introduce a pose-driven deformation field based on the linear blend skinning algorithm, which combines the blend weight field and the 3D human skeleton to produce observation-to-canonical correspondences. Since 3D human skeletons are more observable, they can regularize the learning of the deformation field. Moreover, the pose-driven deformation field can be controlled by input skeletal motions to generate new deformation fields to animate the canonical human model. Experiments show that our approach significantly outperforms recent human modeling methods.