University of Oulu

X. Zhao, Y. Lin, L. Liu, J. Heikkilä and W. Zheng, "Dynamic Texture Classification Using Unsupervised 3D Filter Learning and Local Binary Encoding," in IEEE Transactions on Multimedia, vol. 21, no. 7, pp. 1694-1708, July 2019. doi: 10.1109/TMM.2018.2890362

Dynamic texture classification using unsupervised 3D filter learning and local binary encoding

Saved in:
Author: Zhao, Xiaochao1,2; Lin, Yaping1; Heikkilä, Janne2;
Organizations: 1College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, P. R. China
2Center for Machine Vision and Signal Analysis, P.O. Box 4500 FI-90014, University of Oulu, Finland
3Key Laboratory of Child Development and Learning Science of Ministry of Education, School of Biological Sciences and Medical Engineering, Southeast University, Nanjing 210096, P. R. China
Format: article
Version: accepted version
Access: open
Online Access: PDF Full Text (PDF, 2.2 MB)
Persistent link:
Language: English
Published: Institute of Electrical and Electronics Engineers, 2019
Publish Date: 2019-12-10


Local binary descriptors, such as local binary pattern (LBP) and its various variants, have been studied extensively in texture and dynamic texture analysis due to their outstanding characteristics, such as grayscale invariance, low computational complexity and good discriminability. Most existing local binary feature extraction methods extract spatio-temporal features from three orthogonal planes of a spatio-temporal volume by viewing a dynamic texture in 3D space. For a given pixel in a video, only a proportion of its surrounding pixels is incorporated in the local binary feature extraction process. We argue that the ignored pixels contain discriminative information that should be explored. To fully utilize the information conveyed by all the pixels in a local neighborhood, we propose extracting local binary features from the spatio-temporal domain with 3D filters that are learned in an unsupervised manner so that the discriminative features along both the spatial and temporal dimensions are captured simultaneously. The proposed approach consists of three components: 1) 3D filtering; 2) binary hashing; and 3) joint histogramming. Densely sampled 3D blocks of a dynamic texture are first normalized to have zero mean and are then filtered by 3D filters that are learned in advance. To preserve more of the structure information, the filter response vectors are decomposed into two complementary components, namely, the signs and the magnitudes, which are further encoded separately into binary codes. The local mean pixels of the 3D blocks are also converted into binary codes. Finally, three types of binary codes are combined via joint or hybrid histograms for the final feature representation. Extensive experiments are conducted on three commonly used dynamic texture databases: 1) UCLA; 2) DynTex; and 3) YUVL. The proposed method provides comparable results to, and even outperforms, many state-of-the-art methods.

see all

Series: IEEE transactions on multimedia
ISSN: 1520-9210
ISSN-E: 1941-0077
ISSN-L: 1520-9210
Volume: 21
Issue: 7
Pages: 1694 - 1708
DOI: 10.1109/TMM.2018.2890362
Type of Publication: A1 Journal article – refereed
Field of Science: 213 Electronic, automation and communications engineering, electronics
Copyright information: © 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.