Deep discriminative model for video classification

Tavakolian, Mohammad; Hadid, Abdenour

Deep discriminative model for video classification

Tavakolian, Mohammad; Hadid, Abdenour (2018-10-06)

Avaa tiedosto

nbnfi-fe2020041415344.pdf (1.206Mt)

nbnfi-fe2020041415344_meta.xml (36.09Kt)

nbnfi-fe2020041415344_solr.xml (28.70Kt)

Lataukset:

URL:

https://doi.org/10.1007/978-3-030-01225-0_24

Tavakolian, Mohammad

Hadid, Abdenour

Springer Nature

06.10.2018

Tavakolian M., Hadid A. (2018) Deep Discriminative Model for Video Classification. In: Ferrari V., Hebert M., Sminchisescu C., Weiss Y. (eds) Computer Vision – ECCV 2018. ECCV 2018. Lecture Notes in Computer Science, vol 11208. Springer, Cham

https://rightsstatements.org/vocab/InC/1.0/
© Springer Nature Switzerland AG 2018. This is a post-peer-review, pre-copyedit version of an article published in Computer Vision – ECCV 2018. ECCV 2018. The final authenticated version is available online at: https://doi.org/10.1007/978-3-030-01225-0_24.
https://rightsstatements.org/vocab/InC/1.0/

doi:https://doi.org/10.1007/978-3-030-01225-0_24

Näytä kaikki kuvailutiedot

Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi-fe2020041415344

Tiivistelmä

Abstract

This paper presents a new deep learning approach for video-based scene classification. We design a Heterogeneous Deep Discriminative Model (HDDM) whose parameters are initialized by performing an unsupervised pre-training in a layer-wise fashion using Gaussian Restricted Boltzmann Machines (GRBM). In order to avoid the redundancy of adjacent frames, we extract spatiotemporal variation patterns within frames and represent them sparsely using Sparse Cubic Symmetrical Pattern (SCSP). Then, a pre-initialized HDDM is separately trained using the videos of each class to learn class-specific models. According to the minimum reconstruction error from the learnt class-specific models, a weighted voting strategy is employed for the classification. The performance of the proposed method is extensively evaluated on two action recognition datasets; UCF101 and Hollywood II, and three dynamic texture and dynamic scene datasets; DynTex, YUPENN, and Maryland. The experimental results and comparisons against state-of-the-art methods demonstrate that the proposed method consistently achieves superior performance on all datasets.

Kokoelmat

Avoin saatavuus [31929]