University of Oulu

W. C. de Melo, E. Granger and A. Hadid, "A Deep Multiscale Spatiotemporal Network for Assessing Depression From Facial Dynamics," in IEEE Transactions on Affective Computing, vol. 13, no. 3, pp. 1581-1592, 1 July-Sept. 2022, doi: 10.1109/TAFFC.2020.3021755.

A deep multiscale spatiotemporal network for assessing depression from facial dynamics

Saved in:
Author: Carneiro de Melo, Wheidima1; Granger, Eric2; Hadid, Abdenour1
Organizations: 1Computer Science and Engineering, University of Oulu, 6370 Oulu, Oulu Finland
2Gnie de la production automatise, cole de technologie suprieure, Montreal, Quebec Canada H3C 1K3
Format: article
Version: accepted version
Access: open
Online Access: PDF Full Text (PDF, 7.7 MB)
Persistent link: http://urn.fi/urn:nbn:fi-fe2022081855738
Language: English
Published: Institute of Electrical and Electronics Engineers, 2020
Publish Date: 2022-08-18
Description:

Abstract

Recently, deep learning models have been successfully employed in video-based affective computing applications. One key application is automatic depression recognition from facial expressions. State-of-the-art approaches to recognize depression typically explore spatial and temporal information individually, by using convolutional neural networks (CNNs) to analyze appearance information, and then by either mapping feature variations or averaging the depression level over video frames. This approach has limitations to represent dynamic information that can help to discriminate between depression levels. In contrast, 3D CNN-based models can directly encode the spatio-temporal relationships, although these models rely on fixed-range temporal information and single receptive field. This approach limits the ability to capture facial expression variations with diverse ranges, and the exploitation of diverse facial areas. In this paper, a novel 3D CNN architecture the Multiscale Spatiotemporal Network (MSN) is introduced to effectively represent facial information related to depressive behaviours. The basic structure of the model is composed of parallel convolutional layers with different temporal depths and sizes of receptive field, which allows the MSN to explore a wide range of spatio-temporal variations in facial expressions. Experimental results on two benchmark datasets show that our MSN is effective, outperforming state-of-the-art methods in automatic depression recognition.

see all

Series: IEEE transactions on affective computing
ISSN: 2371-9850
ISSN-E: 1949-3045
ISSN-L: 2371-9850
Volume: 13
Issue: 3
Pages: 1581 - 1592
DOI: 10.1109/TAFFC.2020.3021755
OADOI: https://oadoi.org/10.1109/TAFFC.2020.3021755
Type of Publication: A1 Journal article – refereed
Field of Science: 113 Computer and information sciences
Subjects:
Funding: This research was partially supported by the Academy of Finland and the Natural Sciences and Engineering Research Council of Canada.
Copyright information: © 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.