University of Oulu

X. Liu, H. Shi, X. Hong, H. Chen, D. Tao and G. Zhao, "Hidden States Exploration for 3D Skeleton-Based Gesture Recognition," 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa Village, HI, USA, 2019, pp. 1846-1855. doi: 10.1109/WACV.2019.00201

Hidden states exploration for 3D skeleton-based gesture recognition

Saved in:
Author: Liu, Xin1,2; Shi, Henglin1; Hong, Xiaopeng1;
Organizations: 1Center for Machine Vision and Signal Analysis, The University of Oulu, Finland
2UBTech Sydney AI Institute and SIT, FEIT, The University of Sydney, Australia
Format: article
Version: accepted version
Access: open
Online Access: PDF Full Text (PDF, 1.5 MB)
Persistent link:
Language: English
Published: Institute of Electrical and Electronics Engineers, 2019
Publish Date: 2019-07-29


3D skeletal data has recently attracted wide attention in human behavior analysis for its robustness to variant scenes, while accurate gesture recognition is still challenging. The main reason lies in the high intra-class variance caused by temporal dynamics. A solution is resorting to the generative models, such as the hidden Markov model (HMM). However, existing methods commonly assume fixed anchors for each hidden state, which is hard to depict the explicit temporal structure of gestures. Based on the observation that a gesture is a time series with distinctly defined phases, we propose a new formulation to build temporal compositions of gestures by the low-rank matrix decomposition. The only assumption is that the gesture’s “hold” phases with static poses are linearly correlated among each other. As such, a gesture sequence could be segmented into temporal states with semantically meaningful and discriminative concepts. Furthermore, different to traditional HMMs which tend to use specific distance metric for clustering and ignore the temporal contextual information when estimating the emission probability, the Long Short-Term Memory (LSTM) is utilized to learn probability distributions over states of HMM. The proposed method is validated on two challenging datasets. Experiments demonstrate that our approach can effectively work on a wide range of gestures and actions, and achieve state-of-the-art performance.

see all

Series: IEEE Winter Conference on Applications of Computer Vision
ISSN: 1550-5790
ISSN-E: 2472-6737
ISSN-L: 1550-5790
ISBN: 978-1-7281-1975-5
ISBN Print: 978-1-7281-1976-2
Pages: 1846 - 1855
Article number: 8659023
DOI: 10.1109/WACV.2019.00201
Host publication: 19th IEEE Winter Conference on Applications of Computer Vision, WACV 2019
Conference: IEEE Winter Conference on Applications of Computer Vision
Type of Publication: A4 Article in conference proceedings
Field of Science: 213 Electronic, automation and communications engineering, electronics
Copyright information: © 2019 IEEE.Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.