Vision-based multi-modal framework for action recognition |
|
Author: | Romaissa, Beddiar Djamila1,2; Mourad, Oussalah2; Brahim, Nini1 |
Organizations: |
1Research Laboratory on Computer Science’s Complex Systems University Laarbi Ben M’hidi, Oum El Bouaghi, Algeria 2Center for Machine Vision and Signal Analysis, University of Oulu, Finland |
Format: | article |
Version: | accepted version |
Access: | open |
Online Access: | PDF Full Text (PDF, 0.4 MB) |
Persistent link: | http://urn.fi/urn:nbn:fi-fe2021102552121 |
Language: | English |
Published: |
IEEE Computer Society,
2021
|
Publish Date: | 2021-10-25 |
Description: |
AbstractHuman activity recognition plays a central role in the development of intelligent systems for video surveillance, public security, health care and home monitoring, where detection and recognition of activities can improve the quality of life and security of humans. Typically, automated, intuitive and real-time systems are required to recognize human activities and identify accurately unusual behaviors in order to prevent dangerous situations. In this work, we explore the combination of three modalities (RGB, depth and skeleton data) to design a robust multi-modal framework for vision-based human activity recognition. Especially, spatial information, body shape/posture and temporal evolution of actions are highlighted using illustrative representations obtained from a combination of dynamic RGB images, dynamic depth images and skeleton data representations. Therefore, each video is represented with three images that summarize the ongoing action. Our framework takes advantage of transfer learning from pre-trained models to extract significant features from these newly created images. Next, we fuse extracted features using Canonical Correlation Analysis and train a Long Short-Term Memory network to classify actions from visual descriptive images. Experimental results demonstrated the reliability of our feature-fusion framework that allows us to capture highly significant features and enables us to achieve the state-of-the-art performance on the public UTD-MHAD and NTU RGB+D datasets. see all
|
Series: |
International Conference on Pattern Recognition |
ISSN: | 1051-4651 |
ISSN-L: | 1051-4651 |
ISBN: | 978-1-7281-8809-6 |
ISBN Print: | 978-1-7281-8808-9 |
Pages: | 5859 - 5866 |
DOI: | 10.1109/ICPR48806.2021.9412863 |
OADOI: | https://oadoi.org/10.1109/ICPR48806.2021.9412863 |
Host publication: |
2020 25th International Conference on Pattern Recognition (ICPR) |
Conference: |
International Conference on Pattern Recognition |
Type of Publication: |
A4 Article in conference proceedings |
Field of Science: |
113 Computer and information sciences |
Subjects: | |
Funding: |
This work is partly supported by the Algerian Residential Training Program Abroad Outstanding National Program (PNE) that supported the first author stay at University of Oulu and European YougRes project (Ref. 823701), which are gratefully acknowledged. |
Copyright information: |
© 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. |