University of Oulu

Yu, Z., Shen, Y., Shi, J. et al. PhysFormer++: Facial Video-Based Physiological Measurement with SlowFast Temporal Difference Transformer. Int J Comput Vis 131, 1307–1330 (2023).

PhysFormer++ : facial video-based physiological measurement with SlowFast temporal difference transformer

Saved in:
Author: Yu, Zitong1; Shen, Yuming2; Shi, Jingang3;
Organizations: 1Great Bay University, Dongguan, 523000, China
2University of Oxford, Oxford, OX13PJ, UK
3Xi’an Jiaotong University, Xi’an, 710049, China
4The University of Hong Kong, Hong Kong, China
5University of Oulu, 90014, Oulu, Finland
Format: article
Version: published version
Access: open
Online Access: PDF Full Text (PDF, 3.2 MB)
Persistent link:
Language: English
Published: Springer Nature, 2023
Publish Date: 2023-04-03


Remote photoplethysmography (rPPG), which aims at measuring heart activities and physiological signals from facial video without any contact, has great potential in many applications (e.g., remote healthcare and affective computing). Recent deep learning approaches focus on mining subtle rPPG clues using convolutional neural networks with limited spatio-temporal receptive fields, which neglect the long-range spatio-temporal perception and interaction for rPPG modeling. In this paper, we propose two end-to-end video transformer based architectures, namely PhysFormer and PhysFormer++, to adaptively aggregate both local and global spatio-temporal features for rPPG representation enhancement. As key modules in PhysFormer, the temporal difference transformers first enhance the quasi-periodic rPPG features with temporal difference guided global attention, and then refine the local spatio-temporal representation against interference. To better exploit the temporal contextual and periodic rPPG clues, we also extend the PhysFormer to the two-pathway SlowFast based PhysFormer++ with temporal difference periodic and cross-attention transformers. Furthermore, we propose the label distribution learning and a curriculum learning inspired dynamic constraint in frequency domain, which provide elaborate supervisions for PhysFormer and PhysFormer++ and alleviate overfitting. Comprehensive experiments are performed on four benchmark datasets to show our superior performance on both intra- and cross-dataset testings. Unlike most transformer networks needed pretraining from large-scale datasets, the proposed PhysFormer family can be easily trained from scratch on rPPG datasets, which makes it promising as a novel transformer baseline for the rPPG community.

see all

Series: International journal of computer vision
ISSN: 0920-5691
ISSN-E: 1573-1405
ISSN-L: 0920-5691
Volume: 131
Pages: 1307 - 1330
DOI: 10.1007/s11263-023-01758-1
Type of Publication: A1 Journal article – refereed
Field of Science: 113 Computer and information sciences
Funding: This work was supported by the Academy of Finland (Academy Professor project EmotionAI with grant numbers 336116 and 345122, and ICT2023 project with grant number 345948), the National Natural Science Foundation of China (Grant No. 62002283), HKU Startup Fund, HKU Seed Fund for Basic Research, and the EPSRC grant: Turing AI Fellowship: EP/W002981/1, EPSRC/MURI grant EP/N019474/1. We would also like to thank the Royal Academy of Engineering and FiveAI. Open Access funding provided by University of Oulu including Oulu University Hospital.
Academy of Finland Grant Number: 336116
Detailed Information: 336116 (Academy of Finland Funding decision)
345122 (Academy of Finland Funding decision)
345948 (Academy of Finland Funding decision)
Copyright information: © The Author(s) 2023. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit