Concatenated frame image based CNN for visual speech recognition |
|
Author: | Saitoh, Takeshi1; Zhou, Ziheng2; Zhao, Guoying2; |
Organizations: |
1Kyushu Institute of Technology, Iizuka, Japan 2University of Oulu, Oulu, Finland |
Format: | article |
Version: | accepted version |
Access: | open |
Online Access: | PDF Full Text (PDF, 2.1 MB) |
Persistent link: | http://urn.fi/urn:nbn:fi-fe201902276383 |
Language: | English |
Published: |
Springer Nature,
2017
|
Publish Date: | 2019-02-27 |
Description: |
AbstractThis paper proposed a novel sequence image representation method called concatenated frame image (CFI), two types of data augmentation methods for CFI, and a framework of CFI-based convolutional neural network (CNN) for visual speech recognition (VSR) task. CFI is a simple, however, it contains spatial-temporal information of a whole image sequence. The proposed method was evaluated with a public database OuluVS2. This is a multi-view audio-visual dataset recorded from 52 subjects. The speaker independent recognition tasks were carried out with various experimental conditions. As the result, the proposed method obtained high recognition accuracy. see all
|
Series: |
Lecture notes in computer science |
ISSN: | 0302-9743 |
ISSN-E: | 1611-3349 |
ISSN-L: | 0302-9743 |
ISBN: | 978-3-319-54427-4 |
ISBN Print: | 978-3-319-54426-7 |
Pages: | 277 - 289 |
DOI: | 10.1007/978-3-319-54427-4_21 |
OADOI: | https://oadoi.org/10.1007/978-3-319-54427-4_21 |
Host publication: |
Computer Vision – ACCV 2016 Workshops : ACCV 2016 International Workshops, Taipei, Taiwan, November 20-24, 2016, Revised Selected Papers, Part II |
Host publication editor: |
Chen, Chu-Song Lu, Jiwen Ma, Kai-Kuang |
Conference: |
Asian Conference on Computer Vision |
Type of Publication: |
A4 Article in conference proceedings |
Field of Science: |
113 Computer and information sciences 213 Electronic, automation and communications engineering, electronics |
Subjects: | |
Funding: |
This work was supported by JSPS KAKENHI Grant Number 15K12601 and 16H03211. |
Copyright information: |
© Springer International Publishing AG 2017. This is a post-peer-review, pre-copyedit version of an article published in Computer Vision – ACCV 2016 Workshops : ACCV 2016 International Workshops, Taipei, Taiwan, November 20-24, 2016, Revised Selected Papers, Part II. The final authenticated version is available online at: http://dx.doi.org/10.1007/978-3-319-54427-4_21.
|