University of Oulu

Pedone, M., Mostafa, A. & Heikkilä, J. Learning-Based Non-rigid Video Depth Estimation Using Invariants to Generalized Bas-Relief Transformations. J Math Imaging Vis 64, 993–1009 (2022).

Learning-based non-rigid video depth estimation using invariants to generalized bas-relief transformations

Saved in:
Author: Pedone, Matteo1; Mostafa, Abdelrahman1; Heikkilä, Janne1
Organizations: 1Center for Machine Vision and Signal Analysis, University of Oulu, Oulu, Finland
Format: article
Version: published version
Access: open
Online Access: PDF Full Text (PDF, 3.3 MB)
Persistent link:
Language: English
Published: Springer Nature, 2022
Publish Date: 2023-06-08


We present a method to locally reconstruct dense video depth maps of a non-rigidly deformable object directly from a video sequence acquired by a static orthographic camera. The estimation of depth is performed locally on spatiotemporal patches of the video, and then, the full depth video is recovered by combining them together. Since the geometric complexity of a local spatiotemporal patch of a deforming non-rigid object is often simple enough to be faithfully represented with a parametric model, we artificially generate a database of small deforming rectangular meshes rendered with different material properties and light conditions, along with their corresponding depth videos, and use such data to train a convolutional neural network. Since the database images are rendered with an orthographic camera model, linear deformations along the optical axis cannot be recovered from the training images. These are known in the literature as generalized bas-relief (GBR) transformations. We address this ambiguity problem by employing the invariant-theoretic normalization procedure in order to obtain complete invariants with respect to this group of transformations, and use them in the loss function of a neural network. We tested our method on both synthetic and Kinect data and experimentally observed that the reconstruction error is significantly lower than the one obtained using conventional non-rigid structure from motion approaches and state-of-the-art video depth estimation techniques.

see all

Series: Journal of mathematical imaging and vision
ISSN: 0924-9907
ISSN-E: 1573-7683
ISSN-L: 0924-9907
Volume: 64
Issue: 9
Pages: 993 - 1009
DOI: 10.1007/s10851-022-01105-y
Type of Publication: A1 Journal article – refereed
Field of Science: 113 Computer and information sciences
213 Electronic, automation and communications engineering, electronics
Funding: The authors would like to thank Prof. Peter J. Olver for his technical advice, and Academy of Finland for the financial support for this research (grant no. 297732). Open Access funding provided by University of Oulu including Oulu University Hospital.
Academy of Finland Grant Number: 297732
Detailed Information: 297732 (Academy of Finland Funding decision)
Copyright information: © The Author(s) 2022. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit