Hierarchical Generation of Human-Object Interactions with Diffusion Probabilistic Models

ICCV 2023

Huaijin Pi1, Sida Peng1, Minghui Yang2, Xiaowei Zhou1, Hujun Bao1

1State Key Lab of CAD & CG, Zhejiang University    2Ant Group   



This paper presents a novel approach to generating the 3D motion of a human interacting with a target object, with a focus on solving the challenge of synthesizing long-range and diverse motions, which could not be fulfilled by existing auto-regressive models or path planning-based methods. We propose a hierarchical generation framework to solve this challenge. Specifically, our framework first generates a set of milestones and then synthesizes the motion along them. Therefore, the long-range motion generation could be reduced to synthesizing several short motion sequences guided by milestones. The experiments on the NSM, COUCH, and SAMP datasets show that our approach outperforms previous methods by a large margin in both quality and diversity.

Pipeline Overview


Our pipeline consists of three components: First, the goal pose is synthesized given the object. Then, a number of milestones with local poses are predicted based on the goal pose. Finally, the trajectory and the full motion sequences are infilled between the milestones.

Qualitative Results


We thank Jintao Lu, Zhi Cen, Zizhang Li, and Kechun Xu for the valuable discussions.
This work was partially supported by the Key Research Project of Zhejiang Lab (No. K2022PG1BB01), NSFC (No. 62172364), and Information Technology Center and State Key Lab of CAD&CG, Zhejiang University.


    author    = {Pi, Huaijin and Peng, Sida and Yang, Minghui and Zhou, Xiaowei and Bao, Hujun},
    title     = {Hierarchical Generation of Human-Object Interactions with Diffusion Probabilistic Models},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {15061-15073}