Bidirectional long short-term memory variational autoencoder

Shi, Henglin; Liu, Xin; Hong, Xiaopeng; Zhao, Guoying

Bidirectional long short-term memory variational autoencoder

Shi, Henglin; Liu, Xin; Hong, Xiaopeng; Zhao, Guoying (2018-09-03)

Avaa tiedosto

nbnfi-fe201902266257.pdf (218.2Kt)

nbnfi-fe201902266257_meta.xml (33.5Kt)

nbnfi-fe201902266257_solr.xml (30.79Kt)

Lataukset:

URL:

http://bmvc2018.org/contents/papers/0963.pdf

Shi, Henglin

Liu, Xin

Hong, Xiaopeng

Zhao, Guoying

Bmva press

03.09.2018

Shi, H. & Liu, X. & Hong, X. & Zhao, G. (2019). Bidirectional long short-term memory variational autoencoder. In Proceedings of the British Machine Vision Conference 2018 (BMVC). 3rd - 6th September, Newcastle UK (pp. 1-11). http://bmvc2018.org/contents/papers/0963.pdf

https://rightsstatements.org/vocab/InC/1.0/
© 2018. The copyright of this document resides with its authors. It may be distributed unchanged freely in print or electronic forms.
https://rightsstatements.org/vocab/InC/1.0/

Näytä kaikki kuvailutiedot

Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi-fe201902266257

Tiivistelmä

Abstract

Variational Autoencoder (VAE) has achieved promising success since its emergence. In recent years, its various variants have been developed, especially those works which extend VAE to handle sequential data [1, 2, 5, 7]. However, these works either do not generate sequential latent variables, or encode latent variables only based on inputs from earlier time-steps. We believe that in real-world situations, encoding latent variables at a specific time-step should be based on not only previous observations, but also succeeding samples. In this work, we emphasize such fact and theoretically derive the bidirectional Long Short-Term Memory Variational Autoencoder (bLSTM-VAE), a novel variant of VAE whose encoders and decoders are implemented by bidirectional Long Short-Term Memory (bLSTM) networks. The proposed bLSTM-VAE can encode sequential inputs as an equal-length sequence of latent variables. A latent variable at a specific time-step is encoded by simultaneously processing observations from the first time-step till current time-step in a forward order and observations from current time-step till the last timestep in a backward order. As a result, we consider that the proposed bLSTM-VAE could learn latent variables reliably by mining the contextual information from the whole input sequence. In order to validate the proposed method, we apply it for gesture recognition using 3D skeletal joint data. The evaluation is conducted on the ChaLearn Look at People gesture dataset and NTU RGB+D dataset. The experimental results show that combining with the proposed bLSTM-VAE, the classification network performs better than when combining with a standard VAE, and also outperforms several state-of-the-art methods.

Kokoelmat

Avoin saatavuus [32026]