Audio-visual kinship verification in the wild |
|
Author: | Wu, Xiaoting1,2; Granger, Eric3; Kinnunen, Tomi H.4; |
Organizations: |
1Center for Machine Vision and Signal Analysis, University of Oulu, Oulu, Finland 2School of Electronics and Information, Northwestern Polytechnical University, Xi’an, China 3Laboratoire d’imagerie, de vision et d’intelligence artificielle (LIVIA), Dept. of Systems Engineering, Ecole de technologie supérieure, Montreal, Canada
4School of Computing, University of Eastern Finland, Joensuu, Finland
|
Format: | article |
Version: | accepted version |
Access: | open |
Online Access: | PDF Full Text (PDF, 0.6 MB) |
Persistent link: | http://urn.fi/urn:nbn:fi-fe2020062445593 |
Language: | English |
Published: |
Institute of Electrical and Electronics Engineers,
2019
|
Publish Date: | 2020-06-24 |
Description: |
AbstractKinship verification is a challenging problem, where recognition systems are trained to establish a kin relation between two individuals based on facial images or videos. However, due to variations in capture conditions (background, pose, expression, illumination and occlusion), state-of-the-art systems currently provide a low level of accuracy. As in many visual recognition and affective computing applications, kinship verification may benefit from a combination of discriminant information extracted from both video and audio signals. In this paper, we investigate for the first time the fusion audio-visual information from both face and voice modalities to improve kinship verification accuracy. First, we propose a new multi-modal kinship dataset called TALking KINship (TALKIN), that is comprised of several pairs of video sequences with subjects talking. State-of-the-art conventional and deep learning models are assessed and compared for kinship verification using this dataset. Finally, we propose a deep Siamese network for multi-modal fusion of kinship relations. Experiments with the TALKIN dataset indicate that the proposed Siamese network provides a significantly higher level of accuracy over baseline uni-modal and multi-modal fusion techniques for kinship verification. Results also indicate that audio (vocal) information is complementary and useful for kinship verification problem. see all
|
Series: |
International Conference on Biometrics |
ISSN: | 2376-4201 |
ISSN-L: | 2376-4201 |
ISBN: | 978-1-7281-3640-0 |
ISBN Print: | 978-1-7281-3641-7 |
Pages: | 1 - 8 |
Article number: | 8987241 |
DOI: | 10.1109/ICB45273.2019.8987241 |
OADOI: | https://oadoi.org/10.1109/ICB45273.2019.8987241 |
Host publication: |
2019 International Conference on Biometrics, ICB 2019 |
Conference: |
International Conference on Biometrics |
Type of Publication: |
A4 Article in conference proceedings |
Field of Science: |
113 Computer and information sciences 213 Electronic, automation and communications engineering, electronics |
Subjects: | |
Funding: |
This work is partially supported by the China Scholarship Council (grant 201706290103), the Academy of Fin-land, and the Natural Sciences and Engineering Research Council of Canada. The authors wish to acknowledge CSC-IT Center for Science, Finland, for the computational resources. The initial help from Dr. Miguel Bordallo López and Dr. Elhocine Boutellaa is also acknowledged. |
Copyright information: |
© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. |