University of Oulu

Wei Peng, Jingang Shi, Tuomas Varanka, Guoying Zhao, Rethinking the ST-GCNs for 3D skeleton-based human action recognition, Neurocomputing, Volume 454, 2021, Pages 45-53, ISSN 0925-2312, https://doi.org/10.1016/j.neucom.2021.05.004

Rethinking the ST-GCNs for 3D skeleton-based human action recognition

Saved in:
Author: Peng, Wei1; Shi, Jingang2; Varanka, Tuomas1;
Organizations: 1CMVS, University of Oulu, Oulu, Finland
2School of Software Engineering, Xi’an Jiaotong University, Xi’an, China
Format: article
Version: published version
Access: open
Online Access: PDF Full Text (PDF, 0.9 MB)
Persistent link: http://urn.fi/urn:nbn:fi-fe2021120859628
Language: English
Published: Elsevier, 2021
Publish Date: 2021-12-08
Description:

Abstract

The skeletal data has been an alternative for the human action recognition task as it provides more compact and distinct information compared to the traditional RGB input. However, unlike the RGB input, the skeleton data lies in a non-Euclidean space that traditional deep learning methods are not able to use their fullest potential. Fortunately, with the emerging trend of Geometric deep learning, the spatial-temporal graph convolutional network (ST-GCN) has been proposed to deal with the action recognition problem from skeleton data. ST-GCN and its variants fit well with skeleton-based action recognition and are becoming the mainstream frameworks for this task. However, the efficiency and the performance of the task are hindered by either fixing the skeleton joint correlations or providing a computational expensive strategy to construct a dynamic topology for the skeleton. We argue that many of these operations are either unnecessary or even harmful for the task. By theoretically and experimentally analysing the state-of-the-art ST-GCNs, we provide a simple but efficient strategy to capture the global graph correlations and thus efficiently model the representation of the input graph sequences. Moreover, the global graph strategy also reduces the graph sequence into the Euclidean space, thus a multi-scale temporal filter is introduced to efficiently capture the dynamic information. With the method, we are not only able to better extract the graph correlations with much fewer parameters (only 12.6% of the current best), but we also achieve a superior performance. Extensive experiments on current largest 3D datasets, NTU-RGB+D and NTU-RGB+D 120, demonstrate the ability of our network to perform efficient and lightweight priority on this task.

see all

Series: Neurocomputing
ISSN: 0925-2312
ISSN-E: 1872-8286
ISSN-L: 0925-2312
Volume: 454
Pages: 45 - 53
DOI: 10.1016/j.neucom.2021.05.004
OADOI: https://oadoi.org/10.1016/j.neucom.2021.05.004
Type of Publication: A1 Journal article – refereed
Field of Science: 113 Computer and information sciences
Subjects:
Funding: This work was supported by Academy ICT 2023 project (grant 328115), the Academy of Finland for project MiGA (grant 316765), the National Natural Science Foundation of China under Grant 62002283, and Infotech Oulu. As well, the authors wish to acknowledge CSC-IT Center for Science, Finland, for computational resources.
Academy of Finland Grant Number: 328115
316765
Detailed Information: 328115 (Academy of Finland Funding decision)
316765 (Academy of Finland Funding decision)
Copyright information: © 2021 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
  https://creativecommons.org/licenses/by/4.0/