PATS: Patch Area Transportation with Subdivision for Local Feature Matching

Abstract

TL;DR: PATS can extract high-quality semi-dense matches even under severe scale variations
and in indistinctive regions with low-textures, repetitive patterns.

Local feature matching aims at establishing sparse correspondences between a pair of images. Recently, detectorfree methods present generally better performance but are not satisfactory in image pairs with large scale differences. In this paper, we propose Patch Area Transportation with Subdivision (PATS) to tackle this issue. Instead of building an expensive image pyramid, we start by splitting the original image pair into equal-sized patches and gradually resizing and subdividing them into smaller patches with the same scale. However, estimating scale differences between these patches is non-trivial since the scale differences are determined by both relative camera poses and scene structures, and thus spatially varying over image pairs. Moreover, it is hard to obtain the ground truth for real scenes. To this end, we propose patch area transportation, which enables learning scale differences in a self-supervised manner. In contrast to bipartite graph matching, which only handles one-to-one matching, our patch area transportation can deal with many-to-many relationships. PATS improves both matching accuracy and coverage, and shows superior performance in downstream tasks, such as relative pose estimation, visual localization,and optical flow estimation.

Pipeline Overview

We a) extract features for patches. Then, we b) formulate the patch area transportation by setting source patches' area $\mathbf a_S$ as $\mathbf 1_N$, regressing target patches' area $\mathbf a_T$, and bound the transportation via visual similarities $\mathbf C$. The feature descriptors $\mathbf f$ that produce $\mathbf C$ and the area regression $\mathbf a_T$ are learned by solving this problem differentially. The solution of this problem $\mathbf P$ also reveals many-to-many patch relationships. Based on $\mathbf P$, we c) find corresponding regions, represented by target patches inside a bounding box $B_i$, for each source patch. The exact patch corresponding position $\hat{\mathbf p}_i$ is the position expectation over $B_i$. After cropping and resizing image contents according to the obtained window sizes, which align the contents to the same scale, we d) subdivide the cropped contents to smaller patches and enter the next iteration.

Qualitative Comparison with LoFTR

Two-view Reconstruction

Since PATS achieves high-precision matches that are densely and uniformly distributed in the images, we can obtain semi-dense reconstruction by simply triangulating the matches in a image pair.

Understanding the Area Transportation in PATS

PATS can detect the related region straightforwardly for patches in the source images and compute accurate areas for them. Therefore, we can generate correct new image pairs after resizing them to the same scale and then match them in more detail in the following iterations, which reduces a big and hard matching problem to many small and simple ones.

BibTeX

@inproceedings{pats,
  title={PATS: Patch Area Transportation with Subdivision for Local Feature Matching},
  author={Junjie Ni, Yijin Li, Zhaoyang Huang, Hongsheng Li, Hujun Bao, Zhaopeng Cui, Guofeng Zhang},
  booktitle={The IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR)},
  year={2023}
}