University of Oulu

L. Zhang, X. Hong, O. Arandjelović and G. Zhao, "Short and Long Range Relation Based Spatio-Temporal Transformer for Micro-Expression Recognition," in IEEE Transactions on Affective Computing, vol. 13, no. 4, pp. 1973-1985, 1 Oct.-Dec. 2022, doi: 10.1109/TAFFC.2022.3213509

Short and long range relation based spatio-temporal transformer for micro-expression recognition

Saved in:
Author: Zhang, Liangfei1; Hong, Xiaopeng2; Arandjelović, Ognjen1;
Organizations: 1School of Computer Science, University of St Andrews, KY16 9AJ St Andrews, U.K.
2Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
3University of Oulu, 90570 Oulu, Finland
Format: article
Version: accepted version
Access: open
Online Access: PDF Full Text (PDF, 3.7 MB)
Persistent link:
Language: English
Published: Institute of Electrical and Electronics Engineers, 2022
Publish Date: 2023-01-05


Being spontaneous, micro-expressions are useful in the inference of a person’s true emotions even if an attempt is made to conceal them. Due to their short duration and low intensity, the recognition of micro-expressions is a difficult task in affective computing. The early work based on handcrafted spatio-temporal features which showed some promise, has recently been superseded by different deep learning approaches which now compete for the state of the art performance. Nevertheless, the problem of capturing both local and global spatio-temporal patterns remains challenging. To this end, herein we propose a novel spatio-temporal transformer architecture — to the best of our knowledge, the first purely transformer based approach (i.e., void of any convolutional network use) for micro-expression recognition. The architecture comprises a spatial encoder which learns spatial patterns, a temporal aggregator for temporal dimension analysis, and a classification head. A comprehensive evaluation on three widely used spontaneous micro-expression data sets, namely SMIC-HS, CASME II and SAMM, shows that the proposed approach consistently outperforms the state of the art, and is the first framework in the published literature on micro-expression recognition to achieve the unweighted F1-score greater than 0.9 on any of the aforementioned data sets. The source code is available at

see all

Series: IEEE transactions on affective computing
ISSN: 2371-9850
ISSN-E: 1949-3045
ISSN-L: 2371-9850
Volume: 13
Issue: 4
Pages: 1973 - 1985
DOI: 10.1109/TAFFC.2022.3213509
Type of Publication: A1 Journal article – refereed
Field of Science: 113 Computer and information sciences
Funding: This work was supported by the China Scholarship Council – University of St Andrews Scholarships under Grant 201908060250 funds Liangfei Zhang for her PhD, in part by the National Key Research and Development Project of China under Grant 2019YFB1312000, in part by the National Natural Science Foundation of China under Grant 62076195, and in part by the Fundamental Research Funds for the Central Universities under Grant AUGA5710011522.
Copyright information: © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.