University of Oulu

R. Togashi, M. Otani, Y. Nakashima, E. Rahtu, J. Heikkilä and T. Sakai, "AxIoU: An Axiomatically Justified Measure for Video Moment Retrieval," 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 21044-21053, doi: 10.1109/CVPR52688.2022.02040.

AxIoU : an axiomatically justified measure for video moment retrieval

Saved in:
Author: Togashi, Riku1; Otani, Mayu2; Nakashima, Yuta3;
Organizations: 1Cyberagent, Inc., Waseda University
2Cyberagent, Inc.
3Osaka University
4Tampere University
5University of Oulu
6Waseda Univeristy
Format: article
Version: accepted version
Access: open
Online Access: PDF Full Text (PDF, 2.5 MB)
Persistent link: http://urn.fi/urn:nbn:fi-fe202301245396
Language: English
Published: Institute of Electrical and Electronics Engineers, 2022
Publish Date: 2023-01-24
Description:

Abstract

Evaluation measures have a crucial impact on the direction of research. Therefore, it is of utmost importance to develop appropriate and reliable evaluation measures for new applications where conventional measures are not well suited. Video Moment Retrieval (VMR) is one such application, and the current practice is to use R@K, θ for evaluating VMR systems. However, this measure has two disadvantages. First, it is rank-insensitive: It ignores the rank positions of successfully localised moments in the top-K ranked list by treating the list as a set. Second, it binarizes the Intersection over Union (IoU) of each retrieved video moment using the threshold θ and thereby ignoring fine-grained localisation quality of ranked moments. We propose an alternative measure for evaluating VMR, called Average Max IoU (AxIoU), which is free from the above two problems. We show that AxIoU satisfies two important axioms for VMR evaluation, namely, Invariance against Redundant Moments and Monotonicity with respect to the Best Moment, and also that R@ K, θ satisfies the first axiom only. We also empirically examine how Ax-IoU agrees with R@K, θ, as well as its stability with respect to change in the test data and human-annotated temporal boundaries.

see all

Series: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN: 1063-6919
ISSN-E: 2575-7075
ISSN-L: 1063-6919
ISBN: 978-1-6654-6946-3
ISBN Print: 978-1-6654-6947-0
Pages: 21044 - 21053
DOI: 10.1109/cvpr52688.2022.02040
OADOI: https://oadoi.org/10.1109/cvpr52688.2022.02040
Host publication: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
Conference: IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Type of Publication: A4 Article in conference proceedings
Field of Science: 113 Computer and information sciences
Subjects:
Funding: This work was partly supported by JST CREST Grant No. JPMJCR20D3, FOREST Grant No. JPMJFR2160, and Academy of Finland project number 324346.
Copyright information: © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works