University of Oulu

L. Huynh, M. Pedone, P. Nguyen, J. Matas, E. Rahtu and J. Heikkilä, "Monocular Depth Estimation Primed by Salient Point Detection and Normalized Hessian Loss," 2021 International Conference on 3D Vision (3DV), 2021, pp. 228-238, doi: 10.1109/3DV53792.2021.00033

Monocular depth estimation primed by salient point detection and normalized Hessian loss

Saved in:
Author: Huynh, Lam1; Pedone, Matteo1; Nguyen, Phong1;
Organizations: 1University of Oulu
2Czech Technical University in Prague
3Tampere University
Format: article
Version: published version
Access: open
Online Access: PDF Full Text (PDF, 8.9 MB)
Persistent link:
Language: English
Published: Institute of Electrical and Electronics Engineers, 2021
Publish Date: 2022-02-23


Deep neural networks have recently thrived on single image depth estimation. That being said, current developments on this topic highlight an apparent compromise between accuracy and network size. This work proposes an accurate and lightweight framework for monocular depth estimation based on a self-attention mechanism stemming from salient point detection. Specifically, we utilize a sparse set of keypoints to train a FuSaNet model that consists of two major components: Fusion-Net and Saliency-Net. In addition, we introduce a normalized Hessian loss term invariant to scaling and shear along the depth direction, which is shown to substantially improve the accuracy. The proposed method achieves state-of-the-art results on NYU-Depth-v2 and KITTI while using 3.1–38.4 times smaller model in terms of the number of parameters than baseline approaches. Experiments on the SUN-RGBD further demonstrate the generalizability of the proposed method.

see all

Series: International Conference on 3D Vision proceedings
ISSN: 2378-3826
ISSN-E: 2475-7888
ISSN-L: 2378-3826
ISBN: 978-1-6654-2688-6
ISBN Print: 978-1-6654-2689-3
Pages: 228 - 238
DOI: 10.1109/3DV53792.2021.00033
Host publication: 2021 International Conference on 3D Vision (3DV)
Conference: International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission
Type of Publication: A4 Article in conference proceedings
Field of Science: 113 Computer and information sciences
Copyright information: © 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.