Towards reading beyond faces for sparsity-aware 3D/4D affect recognition

Behzad, Muzammil; Vo, Nhat; Li, Xiaobai; Zhao, Guoying

Towards reading beyond faces for sparsity-aware 3D/4D affect recognition

Behzad, Muzammil; Vo, Nhat; Li, Xiaobai; Zhao, Guoying (2021-06-16)

Avaa tiedosto

nbnfi-fe2021100649452.pdf (2.661Mt)

nbnfi-fe2021100649452_meta.xml (36.85Kt)

nbnfi-fe2021100649452_solr.xml (31.28Kt)

Lataukset:

URL:

https://doi.org/10.1016/j.neucom.2021.06.023

Behzad, Muzammil

Vo, Nhat

Li, Xiaobai

Zhao, Guoying

Elsevier

16.06.2021

Muzammil Behzad, Nhat Vo, Xiaobai Li, Guoying Zhao, Towards Reading Beyond Faces for Sparsity-aware 3D/4D Affect Recognition, Neurocomputing, Volume 458, 2021, Pages 297-307, ISSN 0925-2312, https://doi.org/10.1016/j.neucom.2021.06.023

https://creativecommons.org/licenses/by/4.0/
© 2021 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
https://creativecommons.org/licenses/by/4.0/

doi:https://doi.org/10.1016/j.neucom.2021.06.023

Näytä kaikki kuvailutiedot

Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi-fe2021100649452

Tiivistelmä

Abstract

In this paper, we present a sparsity-aware deep network for automatic 3D/4D facial expression recognition (FER). We first propose a novel augmentation method to combat the data limitation problem for deep learning, specifically given 3D/4D face meshes. This is achieved by projecting the input data into RGB and depth map images and then iteratively performing randomized channel concatenation. Encoded in the given 3D landmarks, we also introduce an effective way to capture the facial muscle movements from three orthogonal plans (TOP), the TOP-landmarks over multi-views. Importantly, we then present a sparsity-aware deep network to compute the sparse representations of convolutional features over multi-views. This is not only effective for a higher recognition accuracy but also computationally convenient. For training, the TOP-landmarks and sparse representations are used to train a long short-term memory (LSTM) network for 4D data, and a pre-trained network for 3D data. The refined predictions are achieved when the learned features collaborate over multi-views. Extensive experimental results achieved on the Bosphorus, BU-3DFE, BU-4DFE and BP4D-Spontaneous datasets show the significance of our method over the state-of-the-art methods and demonstrate its effectiveness by reaching a promising accuracy of 99.69% on BU-4DFE for 4D FER.

Kokoelmat

Avoin saatavuus [32049]

Ellei muuten mainita, aineiston lisenssi on https://creativecommons.org/licenses/by/4.0/