University of Oulu

X. Huang, A. Dhall, R. Goecke, M. Pietikäinen and G. Zhao, "Multimodal Framework for Analyzing the Affect of a Group of People," in IEEE Transactions on Multimedia, vol. 20, no. 10, pp. 2706-2721, Oct. 2018. doi: 10.1109/TMM.2018.2818015

Multimodal framework for analyzing the affect of a group of people

Saved in:
Author: Huang, Xiaohua1; Dhall, Abhinav2; Goecke, Roland3;
Organizations: 1Center for Machine Vision and Signal Analysis, University of Oulu, Oulu 90014, Finland
2Department of Computer Science and Engineering, Indian Institute of Technology Ropar, Rupnagar 140001, India
3Human-Centred Technology Research Centre, University of Canberra, Bruce, ACT 2617, Australia
4School of Information and Technology, Northwest University, Xi’an 710069, China
Format: article
Version: accepted version
Access: open
Online Access: PDF Full Text (PDF, 1.5 MB)
Persistent link:
Language: English
Published: Institute of Electrical and Electronics Engineers, 2018
Publish Date: 2019-04-05


With the advances in multimedia and the world wide web, users upload millions of images and videos everyone on social networking platforms on the Internet. From the perspective of automatic human behavior understanding, it is of interest to analyze and model the affects that are exhibited by groups of people who are participating in social events in these images. However, the analysis of the affect that is expressed by multiple people is challenging due to the varied indoor and outdoor settings. Recently, a few interesting works have investigated face-based group-level emotion recognition (GER). In this paper, we propose a multimodal framework for enhancing the affective analysis ability of GER in challenging environments. Specifically, for encoding a person’s information in a group-level image, we first propose an information aggregation method for generating feature descriptions of face, upper body, and scene. Later, we revisit localized multiple kernel learning for fusing face, upper body, and scene information for GER against challenging environments. Intensive experiments are performed on two challenging group-level emotion databases (HAPPEI and GAFF) to investigate the roles of the face, upper body, scene information, and the multimodal framework. Experimental results demonstrate that the multimodal framework achieves promising performance for GER.

see all

Series: IEEE transactions on multimedia
ISSN: 1520-9210
ISSN-E: 1941-0077
ISSN-L: 1520-9210
Volume: 20
Issue: 10
Pages: 2706 - 2721
DOI: 10.1109/TMM.2018.2818015
Type of Publication: A1 Journal article – refereed
Field of Science: 113 Computer and information sciences
Funding: This work was supported in part by the Jorma Ollila Grant of Nokia Foundation; in part by the Central Fund of Finnish Cultural Foundation; in part by the AI Grant of Kaute Foundation, in part by the Academy of Finland; in part by the Tekes Fidipro Program under Grant 1849/31/2015 and Tekes project (Grant No. 3116/31/2017); in part by the Infotech; and in part by the National Natural Science Foundation of China under Grant 61772419.
Copyright information: © 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.