University of Oulu

G. Trigeorgis, M. A. Nicolaou, S. Zafeiriou and B. W. Schuller, "Deep Canonical Time Warping," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 5110-5118. doi: 10.1109/CVPR.2016.552

Deep canonical time warping

Saved in:
Author: Trigeorgis, George1; Nicolaou, Mihalis A.2; Zafeiriou, Stefanos1,3;
Organizations: 1Imperial College London, UK
2Goldsmiths, University of London,UK
3Center for Machine Vision and Signal Analysis, University of Oulu, Finland
Format: article
Version: accepted version
Access: open
Online Access: PDF Full Text (PDF, 3.8 MB)
Persistent link:
Language: English
Published: Institute of Electrical and Electronics Engineers, 2016
Publish Date: 2019-02-27


Machine learning algorithms for the analysis of timeseries often depend on the assumption that the utilised data are temporally aligned. Any temporal discrepancies arising in the data is certain to lead to ill-generalisable models, which in turn fail to correctly capture the properties of the task at hand. The temporal alignment of time-series is thus a crucial challenge manifesting in a multitude of applications. Nevertheless, the vast majority of algorithms oriented towards the temporal alignment of time-series are applied directly on the observation space, or utilise simple linear projections. Thus, they fail to capture complex, hierarchical non-linear representations which may prove to be beneficial towards the task of temporal alignment, particularly when dealing with multi-modal data (e.g., aligning visual and acoustic information). To this end, we present the Deep Canonical Time Warping (DCTW), a method which automatically learns complex non-linear representations of multiple time-series, generated such that (i) they are highly correlated, and (ii) temporally in alignment. By means of experiments on four real datasets, we show that the representations learnt via the proposed DCTW significantly outperform state-of-the-art methods in temporal alignment, elegantly handling scenarios with highly heterogeneous features, such as the temporal alignment of acoustic and visual features.

see all

ISBN Print: 978-1-4673-8851-1
Pages: 5110 - 5118
DOI: 10.1109/CVPR.2016.552
Host publication: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Conference: IEEE Conference on Computer Vision and Pattern Recognition
Type of Publication: A4 Article in conference proceedings
Field of Science: 113 Computer and information sciences
Funding: George Trigeorgis is a recipient of the fellowship of the Department of Computing, Imperial College London, and this work was partially funded by it. The work of Stefanos Zafeiriou was partially funded by the EPSRC project EP/J017787/1 (4D-FAB) and by the the FiDiPro program of Tekes (project number: 1849/31/2015). The work of Bjorn W. Schuller was partially funded by the European Community’s Horizon 2020 Framework Programme under grant agreement No. 645378 (ARIA-VALUSPA). We would like to thank the NVIDIA Corporation for donating a Tesla K40 GPU used in this work.
Copyright information: © 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.