Li, B., Jiang, W., Peng, J., & Li, X. (2022). Deep learning-based remote-photoplethysmography measurement from short-time facial video. Physiological Measurement, 43(11), 115003. https://doi.org/10.1088/1361-6579/ac98f1
Deep learning-based remote-photoplethysmography measurement from short-time facial video
|Author:||Li, Bin1; Jiang, Wei1; Peng, Jinye1;|
1School of Information Science and Technology, Northwest University, Xi'an, People's Republic of China
2Center for Machine Vision and Signal Analysis, University of Oulu, Oulu
|Persistent link:|| http://urn.fi/urn:nbn:fi-fe2023041135743
|Publish Date:|| 2023-11-03
Objective: Efficient non-contact heart rate (HR) measurement from facial video has received much attention in health monitoring. Past methods relied on prior knowledge and an unproven hypothesis to extract remote photoplethysmography (rPPG) signals, e.g. manually designed regions of interest (ROIs) and the skin reflection model.
Approach: This paper presents a short-time end to end HR estimation framework based on facial features and temporal relationships of video frames. In the proposed method, a deep 3D multi-scale network with cross-layer residual structure is designed to construct an autoencoder and extract robust rPPG features. Then, a spatial-temporal fusion mechanism is proposed to help the network focus on features related to rPPG signals. Both shallow and fused 3D spatial-temporal features are distilled to suppress redundant information in the complex environment. Finally, a data augmentation strategy is presented to solve the problem of uneven distribution of HR in existing datasets.
Main results: The experimental results on four face-rPPG datasets show that our method overperforms the state-of-the-art methods and requires fewer video frames. Compared with the previous best results, the proposed method improves the root mean square error (RMSE) by 5.9%, 3.4% and 21.4% on the OBF dataset (intra-test), COHFACE dataset (intra-test) and UBFC dataset (cross-test), respectively.
Significance: Our method achieves good results on diverse datasets (i.e. highly compressed video, low-resolution and illumination variation), demonstrating that our method can extract stable rPPG signals in short time.
|Type of Publication:||
A1 Journal article – refereed
|Field of Science:||
113 Computer and information sciences
This work was supported by the China Postdoctoral Science Foundation (Program No. 2020M683696XB); The Xi’an Key Laboratory of Intelligent Perception and Cultural Inheritance (No. 2019219614SYS011CG033); Natural Science Basic Research Plan in Shaanxi Province of China (Program No. 2021JQ-455).
This is the Accepted Manuscript version of an article accepted for publication in Physiological Measurement. IOP Publishing Ltd is not responsible for any errors or omissions in this version of the manuscript or any version derived from it. The Version of Record is available online at https://doi.org/10.1088/1361-6579/ac98f1. This Accepted Manuscript is available for reuse under a CC BY-NC-ND licence after the 12 month embargo period provided that all the terms of the licence are adhered to.