Concatenated frame image based CNN for visual speech recognition

Saitoh, Takeshi; Zhou, Ziheng; Zhao, Guoying; Pietikäinen, Matti

Concatenated frame image based CNN for visual speech recognition

Saitoh, Takeshi; Zhou, Ziheng; Zhao, Guoying; Pietikäinen, Matti (2017-03-16)

Avaa tiedosto

nbnfi-fe201902276383.pdf (2.066Mt)

nbnfi-fe201902276383_meta.xml (39.18Kt)

nbnfi-fe201902276383_solr.xml (29.74Kt)

Lataukset:

URL:

https://doi.org/ 10.1007/978-3-319-54427-4_21

Saitoh, Takeshi

Zhou, Ziheng

Zhao, Guoying

Pietikäinen, Matti

Springer Nature

16.03.2017

Saitoh T., Zhou Z., Zhao G., Pietikäinen M. (2017) Concatenated Frame Image Based CNN for Visual Speech Recognition. In: Chen CS., Lu J., Ma KK. (eds) Computer Vision – ACCV 2016 Workshops. ACCV 2016. Lecture Notes in Computer Science, vol 10117. Springer, Cham.

https://rightsstatements.org/vocab/InC/1.0/
© Springer International Publishing AG 2017. This is a post-peer-review, pre-copyedit version of an article published in Computer Vision – ACCV 2016 Workshops : ACCV 2016 International Workshops, Taipei, Taiwan, November 20-24, 2016, Revised Selected Papers, Part II. The final authenticated version is available online at: http://dx.doi.org/10.1007/978-3-319-54427-4_21.
https://rightsstatements.org/vocab/InC/1.0/

doi:https://doi.org/10.1007/978-3-319-54427-4_21

Näytä kaikki kuvailutiedot

Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi-fe201902276383

Tiivistelmä

Abstract

This paper proposed a novel sequence image representation method called concatenated frame image (CFI), two types of data augmentation methods for CFI, and a framework of CFI-based convolutional neural network (CNN) for visual speech recognition (VSR) task. CFI is a simple, however, it contains spatial-temporal information of a whole image sequence. The proposed method was evaluated with a public database OuluVS2. This is a multi-view audio-visual dataset recorded from 52 subjects. The speaker independent recognition tasks were carried out with various experimental conditions. As the result, the proposed method obtained high recognition accuracy.

Kokoelmat

Avoin saatavuus [31928]