BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation

Arxiv 2025


1State Key Lab of CAD & CG, Zhejiang University    2Ant Group    3EPFL    4Chongqing University    5Northwestern Polytechnical University    6Shenzhen University   

Abstract


TL;DR: We propose a novel generalizable object pose estimation method that leverages object bouding box corners as intermediate representation.

BoxDreamer Overview

BoxDreamer is a generalizable RGB-based approach for object pose estimation, specifically designed to address challenges in sparse-view settings. While existing methods can estimate the poses of unseen objects, their generalization ability remains limited in scenarios involving occlusions and sparse reference views, restricting their real-world applicability. To overcome these limitations, we introduce corner points of the object bounding box as an intermediate representation of the object pose. The 3D object corners can be reliably recovered from sparse input views, while the 2D corner points in the target view are estimated through a novel reference-based point synthesizer, which works well even in scenarios involving occlusions. As object semantic points, object corners naturally establish 2D-3D correspondences for object pose estimation with a PnP algorithm. Extensive experiments on the YCB-Video and Occluded-LINEMOD datasets show that our approach outperforms state-of-the-art methods, highlighting the effectiveness of the proposed representation and significantly enhancing the generalization capabilities of object pose estimation, which is crucial for real-world applications. The code will be released for the reproducibility.

Qualitative Comparison on YCB-Video and Occluded LINEMOD


BoxDreamer shows impressive generalization capabilities on the YCB-Video and Occluded LINEMOD datasets with only 5 reference views.

More Qualitative Results


BoxDreamer demonstrates robust performance against occlusions, outperforming state-of-the-art RGB-based generalizable object pose estimation methods.

More qualitative results

In the Wild Examples


Try BoxDreamer demo on Hugging Face Spaces with your own images!    (The bouding box is not aligned on the semantic level)

Citation


@inproceedings{BoxDreamer2025,
    title={BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation},
    author={Yu, Yuanhong and He, Xingyi and Zhao, Chen and Yu, Junhao and Yang, Jiaqi and 
            Hu, Ruizhen and Shen, Yujun and Zhu, Xing and Zhou, Xiaowei and Peng, Sida},
    year={2025}
}