D-Prism: Differentiable Primitives for Structured Dynamic Modeling

CVPR 2026

Xingyuan Yu¹, Yijin Li¹, Chong Zeng², Yuhang Ming³, Hujun Bao¹, Guofeng Zhang^1†

¹State Key Lab of CAD & CG, Zhejiang University, ²Stanford University, ³Hangzhou Dianzi University

Paper arXiv Code (Coming Soon)

D-Prism is a novel framework based on structured primitives for dynamic geometry reconstruction using monocular inputs.

Ground Truth

DG-Mesh (Mesh)

DG-Mesh (Rendered)

Ours (Primitives)

Ours (Rendered)

Abstract

Capturing both geometry and rigid motion for structured dynamic objects, like multi-part assemblies or jointed mechanisms, remains a key challenge. Existing dynamic methods, such as deformable meshes or 3DGS, rely on unstructured representations and fail to jointly model suitable geometry and articulated motion. Primitive-based methods excel at structured static scenes, but their dynamic potential is still unexplored. We propose D-Prism, the first framework to achieve high-fidelity structured dynamic modeling by extending differentiable primitives to the dynamic domain. Specifically, we bind 3DGS to primitive surfaces, leveraging their respective strengths in appearance and geometry. We introduce a deformation network to control primitive motion, ensuring it accurately matches the object's movement. Furthermore, we design a novel adaptive control strategy to dynamically adjust primitive counts, better matching objects' true spatial footprint. Experiments confirm that our method excels at structured dynamic modeling, providing both structured geometry and precise motion tracking.

Pipeline

Overview of D-Prism. Given calibrated monocular images and masks, our method learns the structured dynamic geometry and appearance for the sequence. Starting from randomly initialized primitives with learnable shape, scale, opacity, and motion parameters, deformation networks (backward and forward mapping) model the object's underlying motion and drive the primitives across timesteps. Meanwhile, a novel Primitive Adaptive Control strategy manages their count and distribution via merge, clone, and prune operations to enhance our framework's representational ability. Finally, 3D Gaussians are bound to primitive surfaces for high-quality rasterization, supervised by RGB and mask losses.

Quantitative Results

Structured Motion Tracking Results on Dynamic Primitive Dataset

Structured motion tracking results. We evaluate geometry motion tracking accuracy using End-Point-Error (EPE) and accuracy thresholds (δ_3D) across six structured dynamic object categories. Our method demonstrates superior or highly competitive performance compared to Shape of Motion, Ub4D, MovingParts, SP-GS, and DG-Mesh.

Structured Dynamic Reconstruction Results on Dynamic Primitive Dataset

Dynamic geometry reconstruction results. We compare our method against Ub4D and DG-Mesh using Dynamic Chamfer Distance (CD_d) and Dynamic Earth Mover's Distance (EMD_d) metrics across six structured dynamic object categories. Our method achieves the best results on most scenes.

Visualization Results

Dynamic Primitive Dataset

Primitives

Rendered

Primitives

Rendered

Primitives

Rendered

Treasure Box

Rubik's Cube

Door

Primitives

Rendered

Primitives

Rendered

Primitives

Rendered

Pliers

Folding Chair

Sunglasses

Real-World Robotic Scenes

Primitives

Rendered

Primitives

Rendered

Dexterous Hand

Robotic Arm

D-NeRF Dataset

Primitives

Rendered

Primitives

Rendered

Primitives

Rendered

Hell Warrior

Hook

Jumping Jacks

Primitives

Rendered

Primitives

Rendered

Mutant

Stand Up

Applications

Our method provides a structured geometric representation that enables easy editing of the reconstructed results. Below we demonstrate several applications powered by our primitive-based representation.

Motion Pattern Transfer

By exchanging motion parameters between different objects' primitives, we can transfer motion patterns across objects. For example, we transfer the opening motion of the Treasure Box lid to the top layer of the Rubik's Cube, or apply the rotation of the Rubik's Cube to the Treasure Box lid.

Primitives

Rendered

Primitives

Rendered

Box with Cube's rotation

Cube with Box's opening

Effortless Articulation

Leveraging our structured representation, we can control limb movements in Blender by simply setting up a skeleton and Inverse Kinematics (IK) parameters based on primitive parent-child relationships — eliminating the need for skinning. This makes the articulation of any reconstructed subject remarkably straightforward.

Primitives

Rendered

Exploded View

Since each primitive represents a distinct part, we can easily create exploded-view visualizations by translating primitives outward from the object center, revealing the internal structure of the reconstruction.

Primitives

Rendered

Physics-based Motion Simulation

By controlling primitive motion parameters, we can simulate physics-based effects on individual object parts. For instance, we simulate the Rubik's Cube rotating upon impact and gradually stopping due to friction, or the Treasure Box lid accelerating open under force and slowly settling back due to joint damping.

Primitives

Rendered

Primitives

Rendered

Box — Joint Damping

Cube — Friction Stop

BibTeX

@inproceedings{yu2026dprism,
  title     = {D-Prism: Differentiable Primitives for Structured Dynamic Modeling},
  author    = {Yu, Xingyuan and Li, Yijin and Zeng, Chong and Ming, Yuhang and Bao, Hujun and Zhang, Guofeng},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2026}
}