W. C. de Melo, E. Granger and A. Hadid, "A Deep Multiscale Spatiotemporal Network for Assessing Depression From Facial Dynamics," in IEEE Transactions on Affective Computing, vol. 13, no. 3, pp. 1581-1592, 1 July-Sept. 2022, doi: 10.1109/TAFFC.2020.3021755.
A deep multiscale spatiotemporal network for assessing depression from facial dynamics
|Author:||Carneiro de Melo, Wheidima1; Granger, Eric2; Hadid, Abdenour1|
1Computer Science and Engineering, University of Oulu, 6370 Oulu, Oulu Finland
2Gnie de la production automatise, cole de technologie suprieure, Montreal, Quebec Canada H3C 1K3
|Online Access:||PDF Full Text (PDF, 7.7 MB)|
|Persistent link:|| http://urn.fi/urn:nbn:fi-fe2022081855738
Institute of Electrical and Electronics Engineers,
|Publish Date:|| 2022-08-18
Recently, deep learning models have been successfully employed in video-based affective computing applications. One key application is automatic depression recognition from facial expressions. State-of-the-art approaches to recognize depression typically explore spatial and temporal information individually, by using convolutional neural networks (CNNs) to analyze appearance information, and then by either mapping feature variations or averaging the depression level over video frames. This approach has limitations to represent dynamic information that can help to discriminate between depression levels. In contrast, 3D CNN-based models can directly encode the spatio-temporal relationships, although these models rely on fixed-range temporal information and single receptive field. This approach limits the ability to capture facial expression variations with diverse ranges, and the exploitation of diverse facial areas. In this paper, a novel 3D CNN architecture the Multiscale Spatiotemporal Network (MSN) is introduced to effectively represent facial information related to depressive behaviours. The basic structure of the model is composed of parallel convolutional layers with different temporal depths and sizes of receptive field, which allows the MSN to explore a wide range of spatio-temporal variations in facial expressions. Experimental results on two benchmark datasets show that our MSN is effective, outperforming state-of-the-art methods in automatic depression recognition.
IEEE transactions on affective computing
|Pages:||1581 - 1592|
|Type of Publication:||
A1 Journal article – refereed
|Field of Science:||
113 Computer and information sciences
This research was partially supported by the Academy of Finland and the Natural Sciences and Engineering Research Council of Canada.
© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.