Shi, H. & Liu, X. & Hong, X. & Zhao, G. (2019). Bidirectional long short-term memory variational autoencoder. In Proceedings of the British Machine Vision Conference 2018 (BMVC). 3rd - 6th September, Newcastle UK (pp. 1-11). http://bmvc2018.org/contents/papers/0963.pdf
Bidirectional long short-term memory variational autoencoder
|Author:||Shi, Henglin1; Liu, Xin1; Hong, Xiaopeng1;|
1Center for Machine Vision and Signal Analysis, University of Oulu, Oulu, Finland
|Online Access:||PDF Full Text (PDF, 0.2 MB)|
|Persistent link:|| http://urn.fi/urn:nbn:fi-fe201902266257
|Publish Date:|| 2019-02-26
Variational Autoencoder (VAE) has achieved promising success since its emergence. In recent years, its various variants have been developed, especially those works which extend VAE to handle sequential data [1, 2, 5, 7]. However, these works either do not generate sequential latent variables, or encode latent variables only based on inputs from earlier time-steps. We believe that in real-world situations, encoding latent variables at a specific time-step should be based on not only previous observations, but also succeeding samples. In this work, we emphasize such fact and theoretically derive the bidirectional Long Short-Term Memory Variational Autoencoder (bLSTM-VAE), a novel variant of VAE whose encoders and decoders are implemented by bidirectional Long Short-Term Memory (bLSTM) networks. The proposed bLSTM-VAE can encode sequential inputs as an equal-length sequence of latent variables. A latent variable at a specific time-step is encoded by simultaneously processing observations from the first time-step till current time-step in a forward order and observations from current time-step till the last timestep in a backward order. As a result, we consider that the proposed bLSTM-VAE could learn latent variables reliably by mining the contextual information from the whole input sequence. In order to validate the proposed method, we apply it for gesture recognition using 3D skeletal joint data. The evaluation is conducted on the ChaLearn Look at People gesture dataset and NTU RGB+D dataset. The experimental results show that combining with the proposed bLSTM-VAE, the classification network performs better than when combining with a standard VAE, and also outperforms several state-of-the-art methods.
|Pages:||1 - 11|
Proceedings of the British Machine Vision Conference 2018 (BMVC). 3rd - 6th September, Newcastle UK
British Machine Vision Conference
|Type of Publication:||
D3 Professional conference proceedings
|Field of Science:||
113 Computer and information sciences
This work was supported by the Academy of Finland, Tekes Fidipro program (Grant No. 1849/31/2015) and Business Finland project (Grant No. 3116/31/2017), Infotech Oulu, and Nokia visiting professor grant. The authors wish to acknowledge CSC – IT Center for Science, Finland, for computational resources.
© 2018. The copyright of this document resides with its authors. It may be distributed unchanged freely in print or electronic forms.