University of Oulu

Usman Muhammad, Zitong Yu, Jukka Komulainen, Self-supervised 2D face presentation attack detection via temporal sequence sampling, Pattern Recognition Letters, Volume 156, 2022, Pages 15-22, ISSN 0167-8655, https://doi.org/10.1016/j.patrec.2022.03.001

Self-supervised 2D face presentation attack detection via temporal sequence sampling

Saved in:
Author: Muhammad, Usman1; Yu, Zitong1; Komulainen, Jukka1
Organizations: 1Center for Machine Vision and Signal Analysis (CMVS), University of Oulu, Finland
Format: article
Version: published version
Access: open
Online Access: PDF Full Text (PDF, 1.3 MB)
Persistent link: http://urn.fi/urn:nbn:fi-fe2022041929477
Language: English
Published: Elsevier, 2022
Publish Date: 2022-04-19
Description:

Abstract

Conventional 2D face biometric systems are vulnerable to presentation attacks performed with different face artefacts, e.g., printouts, video-replays and wearable 3D masks. The research focus in face presentation attack detection (PAD) has been recently shifting towards end-to-end learning of deep representations directly from annotated data rather than designing hand-crafted (low-level) features. However, even the state-of-the-art deep learning based face PAD models have shown unsatisfying generalization performance when facing unknown attacks or acquisition conditions due to lack of representative training and tuning data available in the existing public benchmarks. To alleviate this issue, we propose a video pre-processing technique called Temporal Sequence Sampling (TSS) for 2D face PAD by removing the estimated inter-frame 2D affine motion in the view and encoding the appearance and dynamics of the resulting smoothed video sequence into a single RGB image. Furthermore, we leverage the features of a Convolutional Neural Network (CNN) by introducing a self-supervised representation learning scheme, where the labels are automatically generated by the TSS method as the stabilized frames accumulated over video clips of different temporal lengths provide the supervision. The learnt feature representations are then fine-tuned for the downstream task using labelled face PAD data. Our extensive experiments on four public benchmarks, namely Replay-Attack, MSU-MFSD, CASIA-FASD and OULU-NPU, demonstrate that the proposed framework provides promising generalization capability and encourage further study in this domain.

see all

Series: Pattern recognition letters
ISSN: 0167-8655
ISSN-E: 1872-7344
ISSN-L: 0167-8655
Volume: 156
Pages: 15 - 22
DOI: 10.1016/j.patrec.2022.03.001
OADOI: https://oadoi.org/10.1016/j.patrec.2022.03.001
Type of Publication: A1 Journal article – refereed
Field of Science: 113 Computer and information sciences
Subjects:
Funding: The financial support of the Tauno Tönning Foundation is gratefully acknowledged.
Copyright information: © 2022 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
  https://creativecommons.org/licenses/by/4.0/