University of Oulu

Fang L., Liu X., Liu L., Xu H., Kang W. (2020) JGR-P2O: Joint Graph Reasoning Based Pixel-to-Offset Prediction Network for 3D Hand Pose Estimation from a Single Depth Image. In: Vedaldi A., Bischof H., Brox T., Frahm JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science, vol 12351. Springer, Cham. https://doi.org/10.1007/978-3-030-58539-6_8

JGR-P2O : joint graph reasoning based pixel-to-offset prediction network for 3D hand pose estimation from a single depth image

Saved in:
Author: Fang, Linpu1; Liu, Xingyan1; Liu, Li2;
Organizations: 1South China University of Technology, China
2Center for Machine Vision and Signal Analysis, University of Oulu, Finland
3Huawei Noah's Ark Lab
Format: article
Version: accepted version
Access: open
Online Access: PDF Full Text (PDF, 1.2 MB)
Persistent link: http://urn.fi/urn:nbn:fi-fe202102154793
Language: English
Published: Springer Nature, 2020
Publish Date: 2021-02-15
Description:

Abstract

State-of-the-art single depth image-based 3D hand pose estimation methods are based on dense predictions, including voxel-to-voxel predictions, point-to-point regression, and pixel-wise estimations. Despite the good performance, those methods have a few issues in nature, such as the poor trade-off between accuracy and efficiency, and plain feature representation learning with local convolutions. In this paper, a novel pixel-wise prediction-based method is proposed to address the above issues. The key ideas are two-fold: (a) explicitly modeling the dependencies among joints and the relations between the pixels and the joints for better local feature representation learning; (b) unifying the dense pixel-wise offset predictions and direct joint regression for end-to-end training. Specifically, we first propose a graph convolutional network (GCN) based joint graph reasoning module to model the complex dependencies among joints and augment the representation capability of each pixel. Then we densely estimate all pixels’ offsets to joints in both image plane and depth space and calculate the joints’ positions by a weighted average over all pixels’ predictions, totally discarding the complex post-processing operations. The proposed model is implemented with an efficient 2D fully convolutional network (FCN) backbone and has only about 1.4M parameters. Extensive experiments on multiple 3D hand pose estimation benchmarks demonstrate that the proposed method achieves new state-of-the-art accuracy while running very efficiently with around a speed of 110 fps on a single NVIDIA 1080Ti GPU (This work was supported in part by the National Natural Science Foundation of China under Grants 61976095, in part by the Science and Technology Planning Project of Guangdong Province under Grant 2018B030323026. This work was also partially supported by the Academy of Finland.). The code is available at https://github.com/fanglinpu/JGR-P2O.

see all

Series: Lecture notes in computer science
ISSN: 0302-9743
ISSN-E: 1611-3349
ISSN-L: 0302-9743
ISBN: 978-3-030-58539-6
ISBN Print: 978-3-030-58538-9
Pages: 120 - 137
DOI: 10.1007/978-3-030-58539-6_8
OADOI: https://oadoi.org/10.1007/978-3-030-58539-6_8
Host publication: Computer Vision – ECCV 2020. ECCV 2020
Host publication editor: Vedaldi, A.
Bischof, H.
Brox, T.
Frahm, J.
Conference: European Conference on Computer Vision
Type of Publication: A4 Article in conference proceedings
Field of Science: 113 Computer and information sciences
213 Electronic, automation and communications engineering, electronics
Subjects:
Funding: This work was supported in part by the National Natural Science Foundation of China under Grants 61976095, in part by the Science and Technology Planning Project of Guangdong Province under Grant 2018B030323026. This work was also partially supported by the Academy of Finland.
Copyright information: © Springer Nature Switzerland AG 2020. This is a post-peer-review, pre-copyedit version of an article published in Computer Vision – ECCV 2020. ECCV 2020. The final authenticated version is available online at: https://doi.org/10.1007/978-3-030-58539-6_8.