University of Oulu

B. D. Romaissa, O. Mourad and N. Brahim, "Vision-Based Multi-Modal Framework for Action Recognition," 2020 25th International Conference on Pattern Recognition (ICPR), 2021, pp. 5859-5866, doi: 10.1109/ICPR48806.2021.9412863

Vision-based multi-modal framework for action recognition

Saved in:
Author: Romaissa, Beddiar Djamila1,2; Mourad, Oussalah2; Brahim, Nini1
Organizations: 1Research Laboratory on Computer Science’s Complex Systems University Laarbi Ben M’hidi, Oum El Bouaghi, Algeria
2Center for Machine Vision and Signal Analysis, University of Oulu, Finland
Format: article
Version: accepted version
Access: open
Online Access: PDF Full Text (PDF, 0.4 MB)
Persistent link: http://urn.fi/urn:nbn:fi-fe2021102552121
Language: English
Published: IEEE Computer Society, 2021
Publish Date: 2021-10-25
Description:

Abstract

Human activity recognition plays a central role in the development of intelligent systems for video surveillance, public security, health care and home monitoring, where detection and recognition of activities can improve the quality of life and security of humans. Typically, automated, intuitive and real-time systems are required to recognize human activities and identify accurately unusual behaviors in order to prevent dangerous situations. In this work, we explore the combination of three modalities (RGB, depth and skeleton data) to design a robust multi-modal framework for vision-based human activity recognition. Especially, spatial information, body shape/posture and temporal evolution of actions are highlighted using illustrative representations obtained from a combination of dynamic RGB images, dynamic depth images and skeleton data representations. Therefore, each video is represented with three images that summarize the ongoing action. Our framework takes advantage of transfer learning from pre-trained models to extract significant features from these newly created images. Next, we fuse extracted features using Canonical Correlation Analysis and train a Long Short-Term Memory network to classify actions from visual descriptive images. Experimental results demonstrated the reliability of our feature-fusion framework that allows us to capture highly significant features and enables us to achieve the state-of-the-art performance on the public UTD-MHAD and NTU RGB+D datasets.

see all

Series: International Conference on Pattern Recognition
ISSN: 1051-4651
ISSN-L: 1051-4651
ISBN: 978-1-7281-8809-6
ISBN Print: 978-1-7281-8808-9
Pages: 5859 - 5866
DOI: 10.1109/ICPR48806.2021.9412863
OADOI: https://oadoi.org/10.1109/ICPR48806.2021.9412863
Host publication: 2020 25th International Conference on Pattern Recognition (ICPR)
Conference: International Conference on Pattern Recognition
Type of Publication: A4 Article in conference proceedings
Field of Science: 113 Computer and information sciences
Subjects:
Funding: This work is partly supported by the Algerian Residential Training Program Abroad Outstanding National Program (PNE) that supported the first author stay at University of Oulu and European YougRes project (Ref. 823701), which are gratefully acknowledged.
Copyright information: © 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.