University of Oulu

G. Trigeorgis, M. A. Nicolaou, B. W. Schuller and S. Zafeiriou, "Deep Canonical Time Warping for Simultaneous Alignment and Representation Learning of Sequences," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 5, pp. 1128-1138, 1 May 2018. doi: 10.1109/TPAMI.2017.2710047

Deep canonical time warping for simultaneous alignment and representation learning of sequences

Saved in:
Author: Trigeorgis, George1; Nicolaou, Mihalis A.2; Schuller, Björn W.1;
Organizations: 1Department of Computing, Imperial College London, London SW7 2RH, United Kingdom
2Department of Computing at Goldsmiths, University of London, London WC1E 7HU, United Kingdom
3Center for Machine Vision and Signal Analysis, University of Oulu, Oulu 90014, Finland
Format: article
Version: accepted version
Access: open
Online Access: PDF Full Text (PDF, 3.7 MB)
Persistent link:
Language: English
Published: Institute of Electrical and Electronics Engineers, 2018
Publish Date: 2019-04-26


Machine learning algorithms for the analysis of time-series often depend on the assumption that utilised data are temporally aligned. Any temporal discrepancies arising in the data is certain to lead to ill-generalisable models, which in turn fail to correctly capture properties of the task at hand. The temporal alignment of time-series is thus a crucial challenge manifesting in a multitude of applications. Nevertheless, the vast majority of algorithms oriented towards temporal alignment are either applied directly on the observation space or simply utilise linear projections-thus failing to capture complex, hierarchical non-linear representations that may prove beneficial, especially when dealing with multi-modal data (e.g., visual and acoustic information). To this end, we present Deep Canonical Time Warping (DCTW), a method that automatically learns non-linear representations of multiple time-series that are (i) maximally correlated in a shared subspace, and (ii) temporally aligned. Furthermore, we extend DCTW to a supervised setting, where during training, available labels can be utilised towards enhancing the alignment process. By means of experiments on four datasets, we show that the representations learnt significantly outperform state-of-the-art methods in temporal alignment, elegantly handling scenarios with heterogeneous feature sets, such as the temporal alignment of acoustic and visual information.

see all

Series: IEEE transactions on pattern analysis and machine intelligence
ISSN: 0162-8828
ISSN-E: 2160-9292
ISSN-L: 0162-8828
Volume: 40
Issue: 5
Pages: 1128 - 1138
DOI: 10.1109/TPAMI.2017.2710047
Type of Publication: A1 Journal article – refereed
Field of Science: 113 Computer and information sciences
Funding: George Trigeorgis is a recipient of the fellowship of the Department of Computing, Imperial College London, and this work was partially funded by it. The work of Stefanos Zafeiriou was partially funded by the EPSRC project EP/J017787/1 (4D-FAB), as well as by the FiDiPro program of Tekes (project number: 1849/31/2015). The work of Björn W. Schuller was partially funded by the European Community’s Horizon 2020 Framework Programme under grant agreement No. 645378 (ARIA-VALUSPA). We also thank the NVIDIA Corporation for donating a Titan X GPU used in this work. The responsibility lies with the authors.
Copyright information: © 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.