University of Oulu

H. Chen, X. Liu and G. Zhao, "Temporal Hierarchical Dictionary with HMM for Fast Gesture Recognition," 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, 2018, pp. 3378-3383. doi: 10.1109/ICPR.2018.8546245

Temporal hierarchical dictionary with HMM for fast gesture recognition

Saved in:
Author: Chen, Haoyu1; Liu, Xin1; Zhao, Guoying1
Organizations: 1Center for Machine Vision and Signal Analysis, University of Oulu, Finland
Format: article
Version: accepted version
Access: open
Online Access: PDF Full Text (PDF, 0.5 MB)
Persistent link:
Language: English
Published: IEEE Computer Society, 2018
Publish Date: 2019-02-26


In this paper, we propose a novel temporal hierarchical dictionary with hidden Markov model (HMM) for gesture recognition task. Dictionaries with spatio-temporal elements have been commonly used for gesture recognition. However, the existing spatio-temporal dictionary based methods need the whole pre-segmented gestures for inference, thus are hard to deal with nonstationary sequences. The proposed method combines HMM with Deep Belief Networks (DBN) to tackle both gesture segmentation and recognition by the inference at the frame level. Besides, we investigate the redundancy in dictionaries and introduce the relative entropy to measure the information richness of a dictionary. Furthermore, when inferring an element, a temporal hierarchy-flat dictionary will be searched entirely every time in which the temporal structure of gestures isn’t utilized sufficiently. The proposed temporal hierarchical dictionary is organized in HMM states and can limit the search range to distinct states. Our framework includes three key novel properties: (1) a temporal hierarchical structure with HMM, which makes both the HMM transition and Viterbi decoding more efficient; (2) a relative entropy model to compress the dictionary with less redundancy; (3) an unsupervised hierarchical clustering algorithm to build a hierarchical dictionary automatically. Our method is evaluated on two gesture datasets and consistently achieves state-of-the-art performance. The results indicate that the dictionary redundancy has a significant impact on the performance which can be tackled by a temporal hierarchy and an entropy model.

see all

Series: International Conference on Pattern Recognition
ISSN: 1051-4651
ISSN-L: 1051-4651
ISBN: 978-1-5386-3788-3
ISBN Print: 978-1-5386-3789-0
Pages: 3378 - 3383
DOI: 10.1109/ICPR.2018.8546245
Host publication: 2018 24th International Conference on Pattern Recognition (ICPR)
Conference: International Conference on Pattern Recognition
Type of Publication: A4 Article in conference proceedings
Field of Science: 113 Computer and information sciences
213 Electronic, automation and communications engineering, electronics
Funding: This work was supported by the Academy of Finland, Tekes Fidipro program (Grant No. 1849/31/2015) and Business Finland project (Grant No. 3116/31/2017), Infotech Oulu, and the National Natural Science Foundation of China (Grants No. 61601362 and 61772419). Haoyu Chen is supported by China Scholarship Council. As well, the authors wish to acknowledge Nokia visting professor grant, and CSC IT Center for Science, Finland, for computational resources. Special thanks to Jiawei and Henglin for their selfless help.
Copyright information: © 2018 European Union. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.