University of Oulu

Z. Xia, X. Hong, X. Gao, X. Feng and G. Zhao, "Spatiotemporal Recurrent Convolutional Networks for Recognizing Spontaneous Micro-Expressions," in IEEE Transactions on Multimedia, vol. 22, no. 3, pp. 626-640, March 2020. https://doi.org/10.1109/TMM.2019.2931351

Spatiotemporal recurrent convolutional networks for recognizing spontaneous micro-expressions

Saved in:
Author: Xia, Zhaoqiang1; Hong, Xiaopeng2; Gao, Xingyu3;
Organizations: 1School of Electronics and Information, Northwestern Polytechnical University, 710129 Shaanxi, China
2Xi’an Jiaotong University, Xi’an, P. R. China
3Institute of Microelectronics, Chinese Academy of Sciences, Beijing, China
4Center for Machine Vision and Signal Analysis, University of Oulu, 90014 Oulu, Finland
Format: article
Version: accepted version
Access: open
Online Access: PDF Full Text (PDF, 1.7 MB)
Persistent link: http://urn.fi/urn:nbn:fi-fe2019120345372
Language: English
Published: Institute of Electrical and Electronics Engineers, 2020
Publish Date: 2019-12-03
Description:

Abstract

Recently, the recognition task of spontaneous facial micro-expressions has attracted much attention with its various real-world applications. Plenty of handcrafted or learned features have been employed for a variety of classifiers and achieved promising performances for recognizing micro-expressions. However, the micro-expression recognition is still challenging due to the subtle spatiotemporal changes of micro-expressions. To exploit the merits of deep learning, we propose a novel deep recurrent convolutional networks based micro-expression recognition approach, capturing the spatiotemporal deformations of micro-expression sequence. Specifically, the proposed deep model is constituted of several recurrent convolutional layers for extracting visual features and a classificatory layer for recognition. It is optimized by an end-to-end manner and obviates manual feature design. To handle sequential data, we exploit two ways to extend the connectivity of convolutional networks across temporal domain, in which the spatiotemporal deformations are modeled in views of facial appearance and geometry separately. Besides, to overcome the shortcomings of limited and imbalanced training samples, two temporal data augmentation strategies as well as a balanced loss are jointly used for our deep network. By performing the experiments on three spontaneous micro-expression datasets, we verify the effectiveness of our proposed micro-expression recognition approach compared to the state-of-the-art methods.

see all

Series: IEEE transactions on multimedia
ISSN: 1520-9210
ISSN-E: 1941-0077
ISSN-L: 1520-9210
Volume: 22
Issue: 3
Pages: 626 - 640
DOI: 10.1109/TMM.2019.2931351
OADOI: https://oadoi.org/10.1109/TMM.2019.2931351
Type of Publication: A1 Journal article – refereed
Field of Science: 113 Computer and information sciences
Subjects:
Funding: We would like to thank the reviewers for their valuable and constructive comments, which greatly helped us to improvethis work. This work is partly supported by the National Natural Science Foundation of China (Nos. 61702419, 61702491), and the Natural Science Basic Research Plan in Shaanxi Province of China (No. 2018JQ6090).
Copyright information: © 2019 IEEE.Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.