Deep learning-based remote-photoplethysmography measurement from short-time facial video

Li, Bin; Jiang, Wei; Peng, Jinye; Li, Xiaobai

Deep learning-based remote-photoplethysmography measurement from short-time facial video

Li, Bin; Jiang, Wei; Peng, Jinye; Li, Xiaobai (2022-11-03)

Avaa tiedosto

nbnfi-fe2023041135743.pdf (5.540Mt)

nbnfi-fe2023041135743_meta.xml (35.18Kt)

nbnfi-fe2023041135743_solr.xml (37.50Kt)

Lataukset:

URL:

https://doi.org/10.1088/1361-6579/ac98f1

Li, Bin

Jiang, Wei

Peng, Jinye

Li, Xiaobai

IOP Publishing

03.11.2022

Li, B., Jiang, W., Peng, J., & Li, X. (2022). Deep learning-based remote-photoplethysmography measurement from short-time facial video. Physiological Measurement, 43(11), 115003. https://doi.org/10.1088/1361-6579/ac98f1

https://creativecommons.org/licenses/by-nc-nd/4.0/
This is the Accepted Manuscript version of an article accepted for publication in Physiological Measurement. IOP Publishing Ltd is not responsible for any errors or omissions in this version of the manuscript or any version derived from it. The Version of Record is available online at https://doi.org/10.1088/1361-6579/ac98f1. This Accepted Manuscript is available for reuse under a CC BY-NC-ND licence after the 12 month embargo period provided that all the terms of the licence are adhered to.
https://creativecommons.org/licenses/by-nc-nd/4.0/

doi:https://doi.org/10.1088/1361-6579/ac98f1

Näytä kaikki kuvailutiedot

Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi-fe2023041135743

Tiivistelmä

Abstract

Objective: Efficient non-contact heart rate (HR) measurement from facial video has received much attention in health monitoring. Past methods relied on prior knowledge and an unproven hypothesis to extract remote photoplethysmography (rPPG) signals, e.g. manually designed regions of interest (ROIs) and the skin reflection model.

Approach: This paper presents a short-time end to end HR estimation framework based on facial features and temporal relationships of video frames. In the proposed method, a deep 3D multi-scale network with cross-layer residual structure is designed to construct an autoencoder and extract robust rPPG features. Then, a spatial-temporal fusion mechanism is proposed to help the network focus on features related to rPPG signals. Both shallow and fused 3D spatial-temporal features are distilled to suppress redundant information in the complex environment. Finally, a data augmentation strategy is presented to solve the problem of uneven distribution of HR in existing datasets.

Main results: The experimental results on four face-rPPG datasets show that our method overperforms the state-of-the-art methods and requires fewer video frames. Compared with the previous best results, the proposed method improves the root mean square error (RMSE) by 5.9%, 3.4% and 21.4% on the OBF dataset (intra-test), COHFACE dataset (intra-test) and UBFC dataset (cross-test), respectively.

Significance: Our method achieves good results on diverse datasets (i.e. highly compressed video, low-resolution and illumination variation), demonstrating that our method can extract stable rPPG signals in short time.

Kokoelmat

Avoin saatavuus [32049]

Ellei muuten mainita, aineiston lisenssi on https://creativecommons.org/licenses/by-nc-nd/4.0/