Combining global and local convolutional 3D networks for detecting depression from facial expressions

de Melo, Wheidima Carneiro; Granger, Eric; Hadid, Abdenour

Combining global and local convolutional 3D networks for detecting depression from facial expressions

de Melo, Wheidima Carneiro; Granger, Eric; Hadid, Abdenour (2019-07-11)

Avaa tiedosto

nbnfi-fe202003248955.pdf (2.014Mt)

nbnfi-fe202003248955_meta.xml (32.10Kt)

nbnfi-fe202003248955_solr.xml (32.31Kt)

Lataukset:

URL:

https://doi.org/10.1109/FG.2019.8756568

de Melo, Wheidima Carneiro

Granger, Eric

Hadid, Abdenour

Institute of Electrical and Electronics Engineers

11.07.2019

W. C. de Melo, E. Granger and A. Hadid, "Combining Global and Local Convolutional 3D Networks for Detecting Depression from Facial Expressions," 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), Lille, France, 2019, pp. 1-8. doi: 10.1109/FG.2019.8756568

https://rightsstatements.org/vocab/InC/1.0/
© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
https://rightsstatements.org/vocab/InC/1.0/

doi:https://doi.org/10.1109/FG.2019.8756568

Näytä kaikki kuvailutiedot

Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi-fe202003248955

Tiivistelmä

Abstract

Deep learning architectures have been successfully applied in video-based health monitoring, to recognize distinctive variations in the facial appearance of subjects. To detect patterns of variation linked to depressive behavior, deep neural networks (NNs) typically exploit spatial and temporal information separately by, e.g., cascading a 2D convolutional NN (CNN) with a recurrent NN (RNN), although the intrinsic spatio-temporal relationships can deteriorate. With the recent advent of 3D CNNs like the convolutional 3D (C3D) network, these spatio-temporal relationships can be modeled to improve performance. However, the accuracy of C3D networks remain an issue when applied to depression detection. In this paper, the fusion of diverse C3D predictions are proposed to improve accuracy, where spatio-temporal features are extracted from global (full-face) and local (eyes) regions of subject. This allows to increasingly focus on a local facial region that is highly relevant for analyzing depression. Additionally, the proposed network integrates 3D Global Average Pooling in order to efficiently summarize spatio-temporal features without using fully-connected layers, and thereby reduce the number of model parameters and potential over-fitting. Experimental results on the Audio Visual Emotion Challenge (AVEC 2013 and AVEC 2014) depression datasets indicates that combining the responses of global and local C3D networks achieves a higher level of accuracy than state-of-the-art systems.

Kokoelmat

Avoin saatavuus [32011]