CG-SLAM

Efficient Dense RGB-D SLAM in a Consistent Uncertainty-aware 3D Gaussian Field

Arxiv 2024

1State Key Lab of CAD & CG, Zhejiang University, 2ZJU-UIUC Institute, International Campus, Zhejiang University
NeuMesh architecture.

CG-SLAM, which adopts a well-designed 3D Gaussian field, can simultaneously achieve state-of-the-art performance in localization, reconstruction and rendering. Benefiting from 3D Gaussian representation and a new GPU-accelerated framework that is developed from a thorough derivative analysis of camera pose in 3D Gaussian Splatting, CG-SLAM can perform extremely fast rendering and solve the long-standing efficiency bottleneck suffered by previous rendering-based SLAM methods.

Abstract

Recently neural radiance fields (NeRF) have been widely exploited as 3D representations for dense simultaneous localization and mapping (SLAM). Despite their notable successes in surface modeling and novel view synthesis, existing NeRF-based methods are hindered by their computationally intensive and time-consuming volume rendering pipeline. This paper presents an efficient dense RGB-D SLAM system, i.e., CG-SLAM, based on a novel uncertainty-aware 3D Gaussian field with high consistency and geometric stability. Through an in-depth analysis of Gaussian Splatting, we propose several techniques to construct a consistent and stable 3D Gaussian field suitable for tracking and mapping. Additionally, a novel depth uncertainty model is proposed to ensure the selection of valuable Gaussian primitives during optimization, thereby improving tracking efficiency and accuracy. Experiments on various datasets demonstrate that CG-SLAM achieves superior tracking and mapping performance with a notable tracking speed of up to 15 Hz. We will make our source code publicly available.

Video


Framework Overview

CG-SLAM Overview.

In a 3D Gaussian field constructed from an RGB-D sequence, we can render color, depth, opacity, and uncertainty maps through a GPU-accelerated rasterizer. Additionally, we attach a new uncertainty property to each Gaussian primitive to filter informative primitives. In the mapping process, we utilize multiple rendering results to design effective loss functions towards a consistent and stable Gaussian field. Subsequently, we employ apperance and geometry cues to perform accurate and efficient tracking.


Tracking and Online Third-person View Rendering

Annotations: (1) Camera tracking is performed with a 340*600 image sequence and we played this video at ×3.5 speed. (2) Following the moving camera, CG-SLAM allows a real-time third-person view rendering.



Dense Reconstruction

Annotations: Room touring in dense reconstruction meshes from CG-SLAM and Co-SLAM. It can be clearly seen that CG-SLAM can recover a finer-grained mesh map.


Reconstruction Evaluation

tracking-accuracy.

It can be observed that our method can reconstruct a detailed mesh map while ensuring efficiency. CG-slam has demonstrated competitive performance in mapping compared to the state-of-the-art NeRF-based PointSLAM, far surpassing other NeRF-based methods. It is worth noting that the Gaussian-based method neither has a global MLP nor a fully covered feature grid, as in Co-SLAM. Consequently, such a system exhibits a slightly weaker hole-filling ability in unobserved areas.


Rendering Evaluation

rendering.

Annotations: We show our rendering results on Replica scenes. We can observe that our rendered images closely resemble the ground truth ones, demonstrating that CG-SLAM is capable of achieving extremely photorealistic rendering performance.

CG-SLAM can yield more photorealistic rendering images than those of the existing NeRF-SLAM and concurrent Gaussian-based works. We require fewer optimization iterations, than the most photorealistic NeRF-based Point-SLAM and concurrent methods, to achieve better rendering performance. Additionally, relying on the 3D Gaussian rasterization, our method can render at an extremely higher speed of 770 FPS in the test.


Efficiency Evaluation

rendering.

We reported the tracking and mapping efficiency in terms of per-iteration time consumption and the total number of optimization iterations. The GPU-accelerated rasterizer and carefully designed pipeline allow our system to expand to a lightweight version, which can work with half-resolution images and perform tracking twice as fast as the full version at the cost of a slight decrease in accuracy. In addition, our lightweight version effectively alleviates the memory consumption while maintaining accuracy, which is a common problem in other concurrent works.


BibTeX

@article{hu2024cg,
    title={CG-SLAM: Efficient Dense RGB-D SLAM in a Consistent Uncertainty-aware 3D Gaussian Field},
    author={Hu, Jiarui and Chen, Xianhao and Feng, Boyin and Li, Guanglin and Yang, Liangjing and Bao, Hujun and Zhang, Guofeng and Cui, Zhaopeng},
    journal={arXiv preprint arXiv:2403.16095},
    year={2024}
}