University of Oulu

Y. Li, X. Huang and G. Zhao, "Joint Local and Global Information Learning With Single Apex Frame Detection for Micro-Expression Recognition," in IEEE Transactions on Image Processing, vol. 30, pp. 249-263, 2021, doi: 10.1109/TIP.2020.3035042

Joint local and global information learning with single apex frame detection for micro-expression recognition

Saved in:
Author: Li, Yante1; Huang, Xiaohua2; Zhao, Guoying1
Organizations: 1Center for Machine Vision and Signal Analysis, University of Oulu, Finland
2School of Computer Engineering, Nanjing Institute of Technology, Nanjing, China
Format: article
Version: accepted version
Access: open
Online Access: PDF Full Text (PDF, 3.8 MB)
Persistent link:
Language: English
Published: Institute of Electrical and Electronics Engineers, 2021
Publish Date: 2021-03-17


Micro-expressions (MEs) are rapid and subtle facial movements that are difficult to detect and recognize. Most recent works have attempted to recognize MEs with spatial and temporal information from video clips. According to psychological studies, the apex frame conveys the most emotional information expressed in facial expressions. However, it is not clear how the single apex frame contributes to micro-expression recognition. To alleviate that problem, this paper firstly proposes a new method to detect the apex frame by estimating pixel-level change rates in the frequency domain. With frequency information, it performs more effectively on apex frame spotting than the currently existing apex frame spotting methods based on the spatio-temporal change information. Secondly, with the apex frame, this paper proposes a joint feature learning architecture coupling local and global information to recognize MEs, because not all regions make the same contribution to ME recognition and some regions do not even contain any emotional information. More specifically, the proposed model involves the local information learned from the facial regions contributing major emotion information, and the global information learned from the whole face. Leveraging the local and global information enables our model to learn discriminative ME representations and suppress the negative influence of unrelated regions to MEs. The proposed method is extensively evaluated using CASME, CASME II, SAMM, SMIC, and composite databases. Experimental results demonstrate that our method with the detected apex frame achieves considerably promising ME recognition performance, compared with the state-of-the-art methods employing the whole ME sequence. Moreover, the results indicate that the apex frame can significantly contribute to micro-expression recognition.

see all

Series: IEEE transactions on image processing
ISSN: 1057-7149
ISSN-E: 1941-0042
ISSN-L: 1057-7149
Volume: 30
Pages: 249 - 263
DOI: 10.1109/TIP.2020.3035042
Type of Publication: A1 Journal article – refereed
Field of Science: 213 Electronic, automation and communications engineering, electronics
Funding: This work was supported by Infotech Oulu, National Natural Science Foundation of China (Grant Nos: 61772419, 62076122), the Academy of Finland for project MiGA (grant 316765), and ICT 2023 project (grant 328115), the Jiangsu Specially-Appointed Professor Program (Grant No. 3051107219003), and the Talent Startup project of NJIT (No. YKJ201982). As well, the authors wish to acknowledge CSC– IT Center for Science, Finland, for computational resources.
Academy of Finland Grant Number: 316765
Detailed Information: 316765 (Academy of Finland Funding decision)
328115 (Academy of Finland Funding decision)
Copyright information: © 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.