University of Oulu

Saitoh T., Zhou Z., Zhao G., Pietikäinen M. (2017) Concatenated Frame Image Based CNN for Visual Speech Recognition. In: Chen CS., Lu J., Ma KK. (eds) Computer Vision – ACCV 2016 Workshops. ACCV 2016. Lecture Notes in Computer Science, vol 10117. Springer, Cham.

Concatenated frame image based CNN for visual speech recognition

Saved in:
Author: Saitoh, Takeshi1; Zhou, Ziheng2; Zhao, Guoying2;
Organizations: 1Kyushu Institute of Technology, Iizuka, Japan
2University of Oulu, Oulu, Finland
Format: article
Version: accepted version
Access: open
Online Access: PDF Full Text (PDF, 2.1 MB)
Persistent link: http://urn.fi/urn:nbn:fi-fe201902276383
Language: English
Published: Springer Nature, 2017
Publish Date: 2019-02-27
Description:

Abstract

This paper proposed a novel sequence image representation method called concatenated frame image (CFI), two types of data augmentation methods for CFI, and a framework of CFI-based convolutional neural network (CNN) for visual speech recognition (VSR) task. CFI is a simple, however, it contains spatial-temporal information of a whole image sequence. The proposed method was evaluated with a public database OuluVS2. This is a multi-view audio-visual dataset recorded from 52 subjects. The speaker independent recognition tasks were carried out with various experimental conditions. As the result, the proposed method obtained high recognition accuracy.

see all

Series: Lecture notes in computer science
ISSN: 0302-9743
ISSN-E: 1611-3349
ISSN-L: 0302-9743
ISBN: 978-3-319-54427-4
ISBN Print: 978-3-319-54426-7
Pages: 277 - 289
DOI: 10.1007/978-3-319-54427-4_21
OADOI: https://oadoi.org/10.1007/978-3-319-54427-4_21
Host publication: Computer Vision – ACCV 2016 Workshops : ACCV 2016 International Workshops, Taipei, Taiwan, November 20-24, 2016, Revised Selected Papers, Part II
Host publication editor: Chen, Chu-Song
Lu, Jiwen
Ma, Kai-Kuang
Conference: Asian Conference on Computer Vision
Type of Publication: A4 Article in conference proceedings
Field of Science: 113 Computer and information sciences
213 Electronic, automation and communications engineering, electronics
Subjects:
Funding: This work was supported by JSPS KAKENHI Grant Number 15K12601 and 16H03211.
Copyright information: © Springer International Publishing AG 2017. This is a post-peer-review, pre-copyedit version of an article published in Computer Vision – ACCV 2016 Workshops : ACCV 2016 International Workshops, Taipei, Taiwan, November 20-24, 2016, Revised Selected Papers, Part II. The final authenticated version is available online at: http://dx.doi.org/10.1007/978-3-319-54427-4_21.