DiffWind

Real-Time and Generalizable Physics-Based Animation via Self-Supervised Neural Skinning

CVPR 2026

1State Key Laboratory of CAD & CG, Zhejiang University,
2BIGAI, 3Shenzhen University, 4University of British Columbia.
† denotes Corresponding Author
PhysSkin teaser.

PhysSkin is a generalizable physics-informed neural skinning framework for object animation. The framework is learned directly from static 3D geometries via physics-informed self-supervision without any annotated data. Once trained, PhysSkin can be applied in a feed-forward manner to perform neural skinning for diverse 3D shapes and discretizations, enabling real-time physics-based animation.

Abstract

Achieving real-time physics-based animation that generalizes across diverse 3D shapes and discretizations remains a fundamental challenge. We introduce PhysSkin, a physics-informed framework that addresses this challenge. In the spirit of Linear Blend Skinning, we learn continuous skinning fields as basis functions lifting motion subspace coordinates to full-space deformation, with subspace defined by handle transformations. To generate mesh-free, discretization-agnostic, and physically consistent skinning fields that generalize well across diverse 3D shapes, PhysSkin employs a new neural skinning fields autoencoder which consists of a transformer-based encoder and a cross-attention decoder. Furthermore, we also develop a novel physics-informed self-supervised learning strategy that incorporates on-the-fly skinning-field normalization and conflict-aware gradient correction, enabling effective balancing of energy minimization, spatial smoothness, and orthogonality constraints. PhysSkin shows outstanding performance on generalizable neural skinning and enables real-time physics-based animation.

Video

Method Overview

PhysSkin Pipeline.

Given a static 3D shape, we first sample volumetric cubature points for animation and surface points for shape encoding. A shape encoder processes the surface points to produce shape latents, from which a set of learnable queries extract handle latents via cross-attention. Subsequently, spatial queries attend to the handle latents through another cross-attention to derive point-wise features, which are decoded into neural skinning fields. A geometry-only, physics-informed learning strategy optimizes the network in a self-supervised manner under physical and orthogonal constraints, enabling the learned skinning fields to support real-time, physics-based animation.

Real-Time Physics-Based Animation

Generalizable Self-Supervised Neural Skinning


BibTeX


@inproceedings{lei2026physskin,
  title     = {PhysSkin: Real-Time and Generalizable Physics-Based Animation via Self-Supervised Neural Skinning},
  author    = {Lei, Yuanhang and Cheng, Tao and Li, Xingxuan and Zhao, Boming and Huang, Siyuan and Hu, Ruizhen and Chen, Peter Yichen and Bao, Hujun and Cui, Zhaopeng},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2026}
}