From Sparse to Dense: Camera Relocalization with Scene-Specific Detector from Feature Gaussian Splatting

Zhiwei Huang^1,2, Hailin Yu²^†, Yichun Shentu², Jin Yuan², Guofeng Zhang¹^†

¹State Key Lab of CAD&CG, Zhejiang University ²SenseTime Research
^† Corresponding Authors

CVPR2025

Feature Gaussian Scene Representation

Radiance Field & Feature Field & Landmarks

Abstract

This paper presents a novel camera relocalization method, STDLoc, which leverages Feature Gaussian as scene representation. STDLoc is a full relocalization pipeline that can achieve accurate relocalization without relying on any pose prior. Unlike previous coarse-to-fine localization methods that require image retrieval first and then feature matching, we propose a novel sparse-to-dense localization paradigm. Based on this scene representation, we introduce a novel matching-oriented Gaussian sampling strategy and a scene-specific detector to achieve efficient and robust initial pose estimation. Furthermore, based on the initial localization results, we align the query feature map to the Gaussian feature field by dense feature matching to enable accurate localization. The experiments on indoor and outdoor datasets show that STDLoc outperforms current state-of-the-art localization methods in terms of localization accuracy and recall.

Overview

Matching-Oriented Sampling

Scene-Specific Detector

Each Gaussian is assigned with a matching score, followed by anchor sampling. For each anchor, the k nearest Gaussians are identified based on spatial distance, and the highest-scoring Gaussian is selected.

We train our scene-specific detector independently for each scene. The image above presents the detected keypoints obtained from both the SuperPoint detector and our scene-specific detector.

STDLoc Pipeline

(a) Sparse Stage: Sparse feature matching is conducted among the sampled landmarks and detected keypoint features. Based on the 2D-3D correspondences obtained from the sparse matching, the initial pose can be estimated.

(b) Dense Stage: Based on initial pose, a dense feature map and a depth map are rendered from the full Feature Gaussian, followed by coarse-to-fine dense feature matching to get dense correspondences. This stage can be iterated to refine the pose.

Qualitative Comparison

Indoor

Outdoor

Note: Since ACE and HLoc do not support novel view synthesis, we use our Feature Gaussian for visualization.

BibTeX

@inproceedings{huang2025stdloc,
  title={From Sparse to Dense: Camera Relocalization with Scene-Specific Detector from {Feature Gaussian Splatting}},
  author={Huang, Zhiwei and Yu, Hailin and Shentu, Yichun and Yuan, Jin and Zhang, Guofeng},
  booktitle={CVPR},
  pages={},
  year={2025}
}