Mobile3DScanner: An Online 3D Scanner for High-quality Object Reconstruction with a Mobile Device

Special Issue of IEEE Transactions on Visualization and Computer Graphics for ISMAR 2021

ISMAR 2021 Best Journal Paper Nominee

Xiaojun Xiang^1, Hanqing Jiang^1, Guofeng Zhang^2*, Yihao Yu¹, Chenchen Li¹, Xingbin Yang¹, Danpeng Chen¹, Hujun Bao²⁺

¹SenseTime Research ²State Key Lab of CAD & CG, Zhejiang University
^* denotes equal contribution and joint first authorship
⁺ denotes corresponding author

Paper

Abstract

We present a novel online 3D scanning system for high-quality object reconstruction with a mobile device, called Mobile3DScanner. Using a mobile device equipped with an embedded RGBD camera, our system provides online 3D object reconstruction capability for users to acquire high-quality textured 3D object models. Starting with a simlutaneous pose tracking and TSDF fusion module, our system allows users to scan an object with a mobile device to get a 3D model for real-time preview. After the real-time scanning process is completed, the scanned 3D model is globally optimized and mapped with multi-view textures as an efficient post- process to get the final textured 3D model on the mobile device. Unlike most existing state-of-the-art systems which can only scan homeware objects such as toys with small dimensions due to the limited computation and memory resources of mobile platforms, our system can reconstruct objects with large dimensions such as statues. We propose a novel visual-inertial ICP approach to achieve real-time accurate 6DoF pose tracking of each incoming frame on the front end, while maintaining a keyframe pool on the back end where the keyframe poses are optimized by local BA. Simultaneously, the keyframe depth maps are fused by the optimized poses to a TSDF model in real-time. Especially, we propose a novel adaptive voxel resizing strategy to solve the out-of-memory problem of large dimension TSDF fusion on mobile platforms. In the post-process, the keyframe poses are globally optimized and the keyframe depth maps are optimized and fused to obtain a final object model with more accurate geometry. The experiments with quantitative and qualitative evaluation demonstrate the effectiveness of the proposed 3D scanning system based on a mobile device, which can successfully achieve online high-quality 3D reconstruction of natural objects with larger dimensions for efficient AR content creation.

System overview

If a user wants to scan a natural object by our system, the object should be put on a horizontal planar surface such as a desk or the ground. As the user scans the object by a mobile device with a rear RGBD camera, our pipeline tracks 6DoF poses of the object in real-time using a visual-inertial ICP (VI-ICP) approach, combined with local BA and loop closure to refine the poses of keyframes. The object is consistently segmented by a spatio-temporal planar surface tracking method. Simultaneously, the incoming depths are fused to a TSDF model for real-time preview, using an adaptive voxel resizing strategy. In the post-process, all keyframe poses and depths are optimized by GBA and SGM, then fused to a TSDF model, followed by Marching Cubes, Poisson Surface Reconstruction (PSR), Shape from Shading (SFS) and Multi-view Texture Mapping to get the final 3D mesh.

Visual-Inertial ICP Tracking

ICP tracking results on case "David". (a) Original ICP in Open3D. (b) Our VI-ICP approach. (c) (d) VI-ICP combined with LBA and Loop Closure respectively.

Our system localizes the camera by loosely coupled integration of ICP and IMU. The IMU module is initialized and optimized with the ICP tracking result continuously and will provide a pose prediction for current frame. The rotation part of the predicted pose and gravity will be integrated into our ICP energy term to enhance the tracking robustness.

Adaptive Voxel Resizing

Adaptive voxel resizing on case “Worker”. (a) Origin voxel size: 4mm. (b) After voxel resizing: 6mm. (c) Final 3D mesh: 4mm.

On a mobile device, the memory usage of TSDF should be kept under an upper limitation (e.g. 200MB) to avoid OOM. The adaptive voxel resizing strategy enable users to scan as large as possible objects with proper voxel resolution, and will be triggered whenever the memory cost of current TSDF model exceeds the memory limitation. We create a new TSDF volume to represent the object with a larger voxel size (e.g. 1.5 times of origin voxel size) and the new TSDF value is efficiently calculated by trilinear interpolation of the old voxels. For case "Worker", it costs 65.51 ms on iPad Pro 2020 to complete the voxel resizing.

Multi-View Stereo with Depth Prior

Depth refinement on case “David”. (a) A keyframe and its two reference ones. (b) original depth map from iPad Pro. (c) SGM without depth prior. (d) SGM with depth prior.

The input depths from consumer RGBD camera such as dToF on iPad Pro might have depth errors or over-smoothness with lost geometric details. We propose to estimate more accurate depths with geometric details by SGM. The dToF depth measurements are used as depth priors to compute weights for multi-frame cost function as follows:

where is the cost fusion weight using a Gaussian function, is the dToF depth prior.

Efficient Shape-From-Shading

Shape from shading results on case of “David” and "Lion". (a) The 3D models by PSR . (b) The 3D models with SFS.

Most existing implementations of SFS are far from feasible to mobile platform due to the limited computation and memory resources. We perform SFS directly on the mesh, by optimizing an intermediate triangle normal map, followed by updating the vertex positions according to the optimization of the normal map. It takes only 3.24 seconds on iPad Pro 2020 CPU with RMSE 3.12mm and MAE 2.473mm, while Intrinsic3D costs 81 minutes on an Intel Core i7 7700K CPU with 32GB RAM and acquire a slightly better result with RMSE 2.89mm and MAE 2.16mm.

Comparison with state-of-the-art methods

Comparison of our Mobile3DScanner with other state-of-the-art methods: (a) Three representative keyframes in case “La Marseillaise”. (b) Open3D. (c) KinectFusion. (d) InfiniTAM. (e) BundleFusion. (f) 3D Scanner App. (g) Ours Mobile3DScanner. (h) The GT model aqcuired by a commercial 3D scanner.

The RMSEs and MAEs of the reconstruction results by our Mobile3DScanner and other SOTA methods on our four experimental cases captured by iPad Pro 2020, with each object scanned by a commercial 3D scanner as GT.

The detailed time consumptions of our Mobile3DScanner in all the substeps of three cases “Deer”, “La Marseillaise” and “Worker” on an iPad Pro 2020, which contain 487 frames, 1098 frames, and 1438 frames respectively.