University of Oulu

X. Wu, E. Granger, T. H. Kinnunen, X. Feng and A. Hadid, "Audio-Visual Kinship Verification in the Wild," 2019 International Conference on Biometrics (ICB), Crete, Greece, 2019, pp. 1-8, doi: 10.1109/ICB45273.2019.8987241

Audio-visual kinship verification in the wild

Saved in:
Author: Wu, Xiaoting1,2; Granger, Eric3; Kinnunen, Tomi H.4;
Organizations: 1Center for Machine Vision and Signal Analysis, University of Oulu, Oulu, Finland
2School of Electronics and Information, Northwestern Polytechnical University, Xi’an, China
3Laboratoire d’imagerie, de vision et d’intelligence artificielle (LIVIA), Dept. of Systems Engineering, Ecole de technologie supérieure, Montreal, Canada
4School of Computing, University of Eastern Finland, Joensuu, Finland
Format: article
Version: accepted version
Access: open
Online Access: PDF Full Text (PDF, 0.6 MB)
Persistent link:
Language: English
Published: Institute of Electrical and Electronics Engineers, 2019
Publish Date: 2020-06-24


Kinship verification is a challenging problem, where recognition systems are trained to establish a kin relation between two individuals based on facial images or videos. However, due to variations in capture conditions (background, pose, expression, illumination and occlusion), state-of-the-art systems currently provide a low level of accuracy. As in many visual recognition and affective computing applications, kinship verification may benefit from a combination of discriminant information extracted from both video and audio signals. In this paper, we investigate for the first time the fusion audio-visual information from both face and voice modalities to improve kinship verification accuracy. First, we propose a new multi-modal kinship dataset called TALking KINship (TALKIN), that is comprised of several pairs of video sequences with subjects talking. State-of-the-art conventional and deep learning models are assessed and compared for kinship verification using this dataset. Finally, we propose a deep Siamese network for multi-modal fusion of kinship relations. Experiments with the TALKIN dataset indicate that the proposed Siamese network provides a significantly higher level of accuracy over baseline uni-modal and multi-modal fusion techniques for kinship verification. Results also indicate that audio (vocal) information is complementary and useful for kinship verification problem.

see all

Series: International Conference on Biometrics
ISSN: 2376-4201
ISSN-L: 2376-4201
ISBN: 978-1-7281-3640-0
ISBN Print: 978-1-7281-3641-7
Pages: 1 - 8
Article number: 8987241
DOI: 10.1109/ICB45273.2019.8987241
Host publication: 2019 International Conference on Biometrics, ICB 2019
Conference: International Conference on Biometrics
Type of Publication: A4 Article in conference proceedings
Field of Science: 113 Computer and information sciences
213 Electronic, automation and communications engineering, electronics
Funding: This work is partially supported by the China Scholarship Council (grant 201706290103), the Academy of Fin-land, and the Natural Sciences and Engineering Research Council of Canada. The authors wish to acknowledge CSC-IT Center for Science, Finland, for the computational resources. The initial help from Dr. Miguel Bordallo López and Dr. Elhocine Boutellaa is also acknowledged.
Copyright information: © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.