University of Oulu

H. Chen, X. Liu, J. Shi and G. Zhao, "Temporal Hierarchical Dictionary Guided Decoding for Online Gesture Segmentation and Recognition," in IEEE Transactions on Image Processing, vol. 29, pp. 9689-9702, 2020, doi: 10.1109/TIP.2020.3028962

Temporal hierarchical dictionary guided decoding for online gesture segmentation and recognition

Saved in:
Author: Chen, Haoyu1; Liu, Xin1; Shi, Jingang2;
Organizations: 1Center for Machine Vision and Signal Analysis, University of Oulu, FI-90014, Finland
2School of Software, Xi’an Jiaotong University, China
Format: article
Version: accepted version
Access: open
Online Access: PDF Full Text (PDF, 1.5 MB)
Persistent link:
Language: English
Published: Institute of Electrical and Electronics Engineers, 2020
Publish Date: 2020-12-03


Online segmentation and recognition of skeleton- based gestures are challenging. Compared with offline cases, the inference of online settings can only rely on the current few frames and always completes before whole temporal movements are performed. However, incompletely performed gestures are ambiguous and their early recognition is easy to fall into local optimum. In this work, we address the problem with a temporal hierarchical dictionary to guide the hidden Markov model (HMM) decoding procedure. The intuition is that, gestures are ambiguous with high uncertainty at early performing phases, and only become discriminate after certain phases. This uncertainty naturally can be measured by entropy. Thus, we propose a measurement called “relative entropy map” (REM) to encode this temporal context to guide HMM decoding. Furthermore, we introduce a progressive learning strategy with which neural networks could learn a robust recognition of HMM states in an iterative manner. The performance of our method is intensively evaluated on three challenging databases and achieves state-of-the-art results. Our method shows the abilities of both extracting the discriminate connotations and reducing large redundancy in the HMM transition process. It is verified that our framework can achieve online recognition of continuous gesture streams even when they are halfway performed.

see all

Series: IEEE transactions on image processing
ISSN: 1057-7149
ISSN-E: 1941-0042
ISSN-L: 1057-7149
Volume: 29
Pages: 9689 - 9702
DOI: 10.1109/TIP.2020.3028962
Type of Publication: A1 Journal article – refereed
Field of Science: 113 Computer and information sciences
Funding: This work is supported by the Academy of Finland for project MiGA (grant 316765), ICT 2023 project (grant 328115), and Infotech Oulu and in part by the Chinese Scholarship Council. As well, the authors wish to acknowledge CSC IT Center for Science, Finland, for computational resources. Xin is supported by the Academy of Finland for postdoctoral researcher project (grant 331146).
Academy of Finland Grant Number: 316765
Detailed Information: 316765 (Academy of Finland Funding decision)
328115 (Academy of Finland Funding decision)
331146 (Academy of Finland Funding decision)
Copyright information: © 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.