Wei Peng, Xiaopeng Hong, Guoying Zhao, Tripool: Graph triplet pooling for 3D skeleton-based action recognition, Pattern Recognition, Volume 115, 2021, 107921, ISSN 0031-3203, https://doi.org/10.1016/j.patcog.2021.107921
Tripool : graph triplet pooling for 3D skeleton-based action recognition
|Author:||Peng, Wei1; Hong, Xiaopeng1,2; Zhao, Guoying3,1|
1Center for Machine Vision and Signal Analysis, University of Oulu, Finland
2School of Cyber Science and Engineering, Faculty of Electronic and Information Engineering, Xian Jiaotong University, PRC
3School of Information and Technology, Northwest University, PRC
|Online Access:||PDF Full Text (PDF, 2.3 MB)|
|Persistent link:|| http://urn.fi/urn:nbn:fi-fe2021041410351
|Publish Date:|| 2021-04-14
Graph Convolutional Network (GCN) has already been successfully applied to skeleton-based action recognition. However, current GCNs in this task are lack of pooling operations such that the architectures are inherently flat, which not only increases the computational complexity but also requires larger memory space to keep the entire graph embedding. More seriously, a flat architecture forces the high-level semantic feature representations to have the same physical structure of the low-level input skeletons, which we argue is unreasonable and harmful for the final performance. To address these issues, we propose Tripool, a novel graph pooling method for 3D action recognition from skeleton data. Tripool provides to optimize a triplet pooling loss, in which both graph topology and global graph context are taken into consideration, to learn a hierarchical graph representation. The training process of graph pooling is efficient since it optimizes the graph topology by minimizing an upper bound of the pooling loss. Besides, Tripool also automatically generates an embedding matrix since the graph is changed after pooling. On one hand, Tripool reduces the computational cost by removing the redundant nodes. On the other hand it overcomes the limitation of the topology constrain for the high-level semantic representations, thus improves the final performance. Tripool can be combined with various graph neural networks in an end-to-end fashion. Comprehensive experiments on two current largest scale 3D datasets are conducted to evaluate our method. With our Tripool, we consistently get the best results in terms of various performance measures.
|Type of Publication:||
A1 Journal article – refereed
|Field of Science:||
113 Computer and information sciences
This work was supported by the Academy of Finland for project MiGA (grant 316765) and ICT 2023 project (grant 328115), and partially by National Natural Science Foundation of China (Grant No. 61772419), Ministry of Education and Culture of Finland for AI forum project, and Infotech Oulu. This work is also funded by National Key Research and Development Project of China under Grant No. 2019YFB1312000, and by National Natural Science Foundation of China under Grant No. 62076195. As well, the authors wish to acknowledge CSC-IT Center for Science, Finland, for computational resources.
|Academy of Finland Grant Number:||
316765 (Academy of Finland Funding decision)
© 2021 The Author(s). Published by Elsevier. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).