A deep multiscale spatiotemporal network for assessing depression from facial dynamics
Carneiro de Melo, Wheidima; Granger, Eric; Hadid, Abdenour (2020-09-04)
W. C. de Melo, E. Granger and A. Hadid, "A Deep Multiscale Spatiotemporal Network for Assessing Depression From Facial Dynamics," in IEEE Transactions on Affective Computing, vol. 13, no. 3, pp. 1581-1592, 1 July-Sept. 2022, doi: 10.1109/TAFFC.2020.3021755.
© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
https://rightsstatements.org/vocab/InC/1.0/
https://urn.fi/URN:NBN:fi-fe2022081855738
Tiivistelmä
Abstract
Recently, deep learning models have been successfully employed in video-based affective computing applications. One key application is automatic depression recognition from facial expressions. State-of-the-art approaches to recognize depression typically explore spatial and temporal information individually, by using convolutional neural networks (CNNs) to analyze appearance information, and then by either mapping feature variations or averaging the depression level over video frames. This approach has limitations to represent dynamic information that can help to discriminate between depression levels. In contrast, 3D CNN-based models can directly encode the spatio-temporal relationships, although these models rely on fixed-range temporal information and single receptive field. This approach limits the ability to capture facial expression variations with diverse ranges, and the exploitation of diverse facial areas. In this paper, a novel 3D CNN architecture the Multiscale Spatiotemporal Network (MSN) is introduced to effectively represent facial information related to depressive behaviours. The basic structure of the model is composed of parallel convolutional layers with different temporal depths and sizes of receptive field, which allows the MSN to explore a wide range of spatio-temporal variations in facial expressions. Experimental results on two benchmark datasets show that our MSN is effective, outperforming state-of-the-art methods in automatic depression recognition.
Kokoelmat
- Avoin saatavuus [31657]