University of Oulu

Nguyen-Ha, P., Sarafianos, N., Lassner, C., Heikkilä, J., Tung, T. (2022). Free-Viewpoint RGB-D Human Performance Capture and Rendering. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13676. Springer, Cham. https://doi.org/10.1007/978-3-031-19787-1_27

Free-viewpoint RGB-D human performance capture and rendering

Saved in:
Author: Nguyen-Ha, Phong1; Sarafianos, Nikolaos2; Lassner, Christoph2;
Organizations: 1Center for Machine Vision and Signal Analysis, University of Oulu, Oulu, Finland
2Meta Reality Labs Research, Sausalito, USA
Format: article
Version: accepted version
Access: embargoed
Persistent link: http://urn.fi/urn:nbn:fi-fe2023040635462
Language: English
Published: Springer Nature, 2022
Publish Date: 2024-10-21
Description:

Abstract

Capturing and faithfully rendering photorealistic humans from novel views is a fundamental problem for AR/VR applications. While prior work has shown impressive performance capture results in laboratory settings, it is non-trivial to achieve casual free-viewpoint human capture and rendering for unseen identities with high fidelity, especially for facial expressions, hands, and clothes. To tackle these challenges we introduce a novel view synthesis framework that generates realistic renders from unseen views of any human captured from a single-view and sparse RGB-D sensor, similar to a low-cost depth camera, and without actor-specific models. We propose an architecture to create dense feature maps in novel views obtained by sphere-based neural rendering, and create complete renders using a global context inpainting model. Additionally, an enhancer network leverages the overall fidelity, even in occluded areas from the original view, producing crisp renders with fine details. We show that our method generates high-quality novel views of synthetic and real human actors given a single-stream, sparse RGB-D input. It generalizes to unseen identities, and new poses and faithfully reconstructs facial expressions. Our approach outperforms prior view synthesis methods and is robust to different levels of depth sparsity.

see all

Series: Lecture notes in computer science
ISSN: 0302-9743
ISSN-E: 1611-3349
ISSN-L: 0302-9743
ISBN: 978-3-031-19787-1
ISBN Print: 978-3-031-19786-4
Issue: 13676
Pages: 473 - 491
DOI: 10.1007/978-3-031-19787-1_27
OADOI: https://oadoi.org/10.1007/978-3-031-19787-1_27
Host publication: Computer Vision – ECCV 2022 : 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XVI
Host publication editor: Avidan, Shai
Brostow, Gabriel
Cissé, Moustapha
Farinella, Giovanni Maria
Hassner, Tal
Conference: European Conference on Computer Vision
Type of Publication: A4 Article in conference proceedings
Field of Science: 113 Computer and information sciences
Subjects:
Copyright information: © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022.