PointSplat: Compact Gaussian Splatting via
Human-Centric Prediction

ECCV 2026

Yujie Guo¹, Yudong Jin¹, Lingteng Qiu³, Zehong Shen¹, Zhen Xu¹, Zhang Jing², Xianchao Shen², Hujun Bao¹, Sida Peng¹, Xiaowei Zhou^{1 †}

¹Zhejiang University ²ByteDance ³CUHK, Shenzhen
^†Corresponding author

arXiv Code Interactive Demo

Abstract

Producing 3D human representations from input views on the fly is essential for immersive live streaming systems, where representation compactness is as critical as high fidelity due to limited computational power and transmission bandwidth. Although recent feed-forward reconstruction methods achieve impressive quality through the view-centric prediction of 3D representations, they repeatedly encode the same subject content across multiple views, leading to significant inter-view redundancy. Our key insight is to perform predictions directly within 3D space, enabling the network to learn and produce a highly compact representation. To this end, we propose PointSplat, a novel human-centric approach that directly infers Gaussian primitives from an input point set. The proposed method first estimates a coarse geometric proxy and performs ray-casting to prune redundant points and establish explicit 2D–3D correspondences. Subsequently, it employs a Point-Image Transformer to fuse appearance and geometry features, predicting Gaussian attributes in a single forward pass. This design restricts predictions to foreground regions of interest, substantially reducing the total number of Gaussians while improving the rendering quality of novel views. Extensive experiments demonstrate that PointSplat achieves higher efficiency and quality, while exhibiting strong robustness to variations in view count and image resolution across multiple datasets.

Motivation

Comparison between view-centric and human-centric prediction. Existing view-centric methods predict pixel-aligned Gaussian maps for each input view and then unproject them into 3D space. This design inherently introduces inter-view redundancy—the same subject content is repeatedly encoded across different views, causing computation and transmission overhead to grow rapidly with the number of views and rendering resolution. In contrast, our human-centric approach directly infers Gaussian primitives in 3D space by fusing multi-view observations into a shared point set, eliminating this redundancy and producing a substantially more compact representation with far fewer points while improving novel-view synthesis quality.

Method

PointSplat pipeline. Given a sparse set of input views, we first estimate a coarse geometric proxy of the human and perform ray-casting against this proxy to construct a compact set of 3D points that lie on the visible foreground surface, eliminating background and inter-view redundancy. Each point is then projected back to gather multi-view image features, and a Point-Image Transformer fuses these appearance features with point-wise geometry features to predict the full set of 3D Gaussian primitives (position offset, color, opacity, scale, and rotation) in a single forward pass. Because predictions are anchored on a shared 3D point set rather than per-view 2D maps, the model produces a substantially more compact representation while maintaining high-fidelity novel-view rendering.

Citation

@article{guo2026pointsplat,
  title={PointSplat: Compact Gaussian Splatting via Human-Centric Prediction},
  author={Yujie Guo and Yudong Jin and Lingteng Qiu and Zehong Shen and Zhen Xu and Jing Zhang and Xianchao Shen and Hujun Bao and Sida Peng and Xiaowei Zhou},
  journal={arXiv preprint arXiv:2606.32036},
  year={2026}
}

PointSplat: Compact Gaussian Splatting via
Human-Centric Prediction

Abstract

Motivation

Method

Interactive Demo

Comparison with Baselines

More Qualitative Results

Citation