University of Oulu

C. Sheng et al., "Importance-Aware Information Bottleneck Learning Paradigm for Lip Reading," in IEEE Transactions on Multimedia, vol. 25, pp. 6563-6574, 2023, doi: 10.1109/TMM.2022.3210761

Importance-aware information bottleneck learning paradigm for lip reading

Saved in:
Author: Sheng, Changchong1; Liu, Li2; Deng, Wanxia1;
Organizations: 1College of Electronic Science and Technology, National University of Defense Technology (NUDT), China
2College of Systems Engineering, NUDT
3Center for Machine Vision and Signal Analysis, University of Oulu, Finland
Format: article
Version: accepted version
Access: open
Online Access: PDF Full Text (PDF, 5.3 MB)
Persistent link: http://urn.fi/urn:nbn:fi-fe20231004138661
Language: English
Published: Institute of Electrical and Electronics Engineers, 2022
Publish Date: 2023-10-04
Description:

Abstract

Lip reading is the task of decoding text from speakers’ mouth movements. Numerous deep learning-based methods have been proposed to address this task. However, these existing deep lip reading models suffer from poor generalization due to overfitting the training data. To resolve this issue, we present a novel learning paradigm that aims to improve the interpretability and generalization of lip reading models. In specific, a Variational Temporal Mask (VTM) module is customized to automatically analyze the importance of frame-level features. Furthermore, the prediction consistency constraints of global information and local temporal important features are introduced to strengthen the model generalization. We evaluate the novel learning paradigm with multiple lip reading baseline models on the LRW and LRW-1000 datasets. Experiments show that the proposed framework significantly improves the generalization performance and interpretability of lip reading models.

see all

Series: IEEE transactions on multimedia
ISSN: 1520-9210
ISSN-E: 1941-0077
ISSN-L: 1520-9210
Volume: 25
Pages: 6563 - 6574
DOI: 10.1109/tmm.2022.3210761
OADOI: https://oadoi.org/10.1109/tmm.2022.3210761
Type of Publication: A1 Journal article – refereed
Field of Science: 113 Computer and information sciences
Subjects:
Funding: This work was partially supported by the National Key R&D Program of China No.2021YFB3100800, the Academy of Finland under grant 331883 and the National Natural Science Foundation of China under Grant 61872379.
Academy of Finland Grant Number: 331883
Detailed Information: 331883 (Academy of Finland Funding decision)
Copyright information: © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.