1Zhejiang University
2The University of Hong Kong
3Ant Group
4Shenzhen University
*Corresponding Author
This paper addresses the task of generating two-character online interactions. Previously, two main settings existed for two-character interaction generation: (1) generating one’s motions based on the counterpart’s complete motion sequence, and (2) jointly generating two-character motions based on specific conditions. We argue that these settings fail to model the process of real-life two-character interactions, where humans will react to their counterparts in real time and act as independent individuals. In contrast, we propose an online reaction policy, called Ready-to-React, to generate the next character pose based on past observed motions. Each character has its own reaction policy as its “brain”, enabling them to interact like real humans in a streaming manner. Our policy is implemented by incorporating a diffusion head into an auto-regressive model, which can dynamically respond to the counterpart’s motions while effectively mitigating the error accumulation throughout the generation process. We conduct comprehensive experiments using the challenging boxing task. Experimental results demonstrate that our method outperforms existing baselines and can generate extended motion sequences. Additionally, we show that our approach can be controlled by sparse signals, making it well-suited for VR and other online interactive environments.
Overview of our online reaction policy. Given a boxing scene at the leftmost figure, where the blue agent is thinking about its next move. The reaction policy follows these steps: first, based on the observations, the history encoder encodes the current state and observations; then, the next latent predictor predicts the upcoming motion latent; and finally, an online motion decoder decodes this motion latent into the actual next pose. The same reaction policy can be applied to the pink agent. Through a streaming process for both agents, our reaction policy enables the continuous generation of two-character motion sequences without length limit.
Here we present our method in the context of generating reactive motions. The opponent’s motion is provided as ground truth.
Our method enables the simultaneous generation of motion for both agents. Starting with the first four frames, each agent’s subsequent motion is generated by leveraging the interaction between their own and their opponent’s past motions.
Given the first four frames of the two-character motion sequence, our method can successfully generate 1800 frames of the two-character motion.
Introducing sparse control is essential for making our method practical in VR online interactive environments. Our approach successfully generates realistic motion while effectively adhering to the sparse signals. The blue agent is controlled by a combination of sparse signals and our reaction policy.
@inproceedings{cen2025ready_to_react,
title={Ready-to-React: Online Reaction Policy for Two-Character Interaction Generation},
author={Cen, Zhi and Pi, Huaijin and Peng, Sida and Shuai, Qing and Shen, Yujun and Bao, Hujun and Zhou, Xiaowei and Hu, Ruizhen},
booktitle={ICLR},
year={2025}
}