D-Prism: Differentiable Primitives for Structured Dynamic Modeling

CVPR 2026

1State Key Lab of CAD & CG, Zhejiang University, 2Stanford University, 3Hangzhou Dianzi University

D-Prism is a novel framework based on structured primitives for dynamic geometry reconstruction using monocular inputs.

Ground Truth
DG-Mesh (Mesh)
DG-Mesh (Rendered)
Ours (Primitives)
Ours (Rendered)

Abstract

Capturing both geometry and rigid motion for structured dynamic objects, like multi-part assemblies or jointed mechanisms, remains a key challenge. Existing dynamic methods, such as deformable meshes or 3DGS, rely on unstructured representations and fail to jointly model suitable geometry and articulated motion. Primitive-based methods excel at structured static scenes, but their dynamic potential is still unexplored. We propose D-Prism, the first framework to achieve high-fidelity structured dynamic modeling by extending differentiable primitives to the dynamic domain. Specifically, we bind 3DGS to primitive surfaces, leveraging their respective strengths in appearance and geometry. We introduce a deformation network to control primitive motion, ensuring it accurately matches the object's movement. Furthermore, we design a novel adaptive control strategy to dynamically adjust primitive counts, better matching objects' true spatial footprint. Experiments confirm that our method excels at structured dynamic modeling, providing both structured geometry and precise motion tracking.

Pipeline

D-Prism Pipeline Overview

Overview of D-Prism. Given calibrated monocular images and masks, our method learns the structured dynamic geometry and appearance for the sequence. Starting from randomly initialized primitives with learnable shape, scale, opacity, and motion parameters, deformation networks (backward and forward mapping) model the object's underlying motion and drive the primitives across timesteps. Meanwhile, a novel Primitive Adaptive Control strategy manages their count and distribution via merge, clone, and prune operations to enhance our framework's representational ability. Finally, 3D Gaussians are bound to primitive surfaces for high-quality rasterization, supervised by RGB and mask losses.

Quantitative Results

Structured Motion Tracking Results on Dynamic Primitive Dataset

Motion Tracking

Structured motion tracking results. We evaluate geometry motion tracking accuracy using End-Point-Error (EPE) and accuracy thresholds (δ3D) across six structured dynamic object categories. Our method demonstrates superior or highly competitive performance compared to Shape of Motion, Ub4D, MovingParts, SP-GS, and DG-Mesh.

Structured Dynamic Reconstruction Results on Dynamic Primitive Dataset

Dynamic Reconstruction

Dynamic geometry reconstruction results. We compare our method against Ub4D and DG-Mesh using Dynamic Chamfer Distance (CDd) and Dynamic Earth Mover's Distance (EMDd) metrics across six structured dynamic object categories. Our method achieves the best results on most scenes.

Visualization Results

Dynamic Primitive Dataset

Primitives
Rendered
Primitives
Rendered
Primitives
Rendered
Treasure Box
Rubik's Cube
Door
Primitives
Rendered
Primitives
Rendered
Primitives
Rendered
Pliers
Folding Chair
Sunglasses

Real-World Robotic Scenes

GT
Primitives
Rendered
GT
Primitives
Rendered
Dexterous Hand
Robotic Arm

D-NeRF Dataset

Primitives
Rendered
Primitives
Rendered
Primitives
Rendered
Hell Warrior
Hook
Jumping Jacks
Primitives
Rendered
Primitives
Rendered
Mutant
Stand Up

Applications

Our method provides a structured geometric representation that enables easy editing of the reconstructed results. Below we demonstrate several applications powered by our primitive-based representation.

Motion Pattern Transfer

By exchanging motion parameters between different objects' primitives, we can transfer motion patterns across objects. For example, we transfer the opening motion of the Treasure Box lid to the top layer of the Rubik's Cube, or apply the rotation of the Rubik's Cube to the Treasure Box lid.

Primitives
Rendered
Primitives
Rendered
Box with Cube's rotation
Cube with Box's opening

Effortless Articulation

Leveraging our structured representation, we can control limb movements in Blender by simply setting up a skeleton and Inverse Kinematics (IK) parameters based on primitive parent-child relationships — eliminating the need for skinning. This makes the articulation of any reconstructed subject remarkably straightforward.

Primitives
Rendered

Exploded View

Since each primitive represents a distinct part, we can easily create exploded-view visualizations by translating primitives outward from the object center, revealing the internal structure of the reconstruction.

Primitives
Rendered

Physics-based Motion Simulation

By controlling primitive motion parameters, we can simulate physics-based effects on individual object parts. For instance, we simulate the Rubik's Cube rotating upon impact and gradually stopping due to friction, or the Treasure Box lid accelerating open under force and slowly settling back due to joint damping.

Primitives
Rendered
Primitives
Rendered
Box — Joint Damping
Cube — Friction Stop

BibTeX

@inproceedings{yu2026dprism,
  title     = {D-Prism: Differentiable Primitives for Structured Dynamic Modeling},
  author    = {Yu, Xingyuan and Li, Yijin and Zeng, Chong and Ming, Yuhang and Bao, Hujun and Zhang, Guofeng},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2026}
}