Concatenated frame image based CNN for visual speech recognition
Saitoh, Takeshi; Zhou, Ziheng; Zhao, Guoying; Pietikäinen, Matti (2017-03-16)
Saitoh T., Zhou Z., Zhao G., Pietikäinen M. (2017) Concatenated Frame Image Based CNN for Visual Speech Recognition. In: Chen CS., Lu J., Ma KK. (eds) Computer Vision – ACCV 2016 Workshops. ACCV 2016. Lecture Notes in Computer Science, vol 10117. Springer, Cham.
© Springer International Publishing AG 2017. This is a post-peer-review, pre-copyedit version of an article published in Computer Vision – ACCV 2016 Workshops : ACCV 2016 International Workshops, Taipei, Taiwan, November 20-24, 2016, Revised Selected Papers, Part II. The final authenticated version is available online at: http://dx.doi.org/10.1007/978-3-319-54427-4_21.
https://rightsstatements.org/vocab/InC/1.0/
https://urn.fi/URN:NBN:fi-fe201902276383
Tiivistelmä
Abstract
This paper proposed a novel sequence image representation method called concatenated frame image (CFI), two types of data augmentation methods for CFI, and a framework of CFI-based convolutional neural network (CNN) for visual speech recognition (VSR) task. CFI is a simple, however, it contains spatial-temporal information of a whole image sequence. The proposed method was evaluated with a public database OuluVS2. This is a multi-view audio-visual dataset recorded from 52 subjects. The speaker independent recognition tasks were carried out with various experimental conditions. As the result, the proposed method obtained high recognition accuracy.
Kokoelmat
- Avoin saatavuus [31928]