A. Tejero-de-Pablos, Y. Nakashima, T. Sato, N. Yokoya, M. Linna and E. Rahtu, "Summarization of User-Generated Sports Video by Using Deep Action Recognition Features," in IEEE Transactions on Multimedia, vol. 20, no. 8, pp. 2000-2011, Aug. 2018. doi: 10.1109/TMM.2018.2794265

Summarization of user-generated sports video by using deep action recognition features

Author: Tejero-de-Pablos, Antonio1,2; Nakashima, Yuta1,3; Sato, Tomokazu1,4;
Organizations: 1Graduate School of Information Science, Nara Institute of Science and Technology, Nara 630-0101, Japan
2Graduate School of Information Science and Technology, University of Tokyo, Tokyo 113-8654, Japan
3Institute for Datability Science, Osaka University, Osaka 565-0871, Japan
4Faculty of Data Science, Shiga University, Shiga 522-8522, Japan
5Center for Machine Vision and Signal Analysis, University of Oulu, Oulu 90014, Finland
6Department of Signal Processing, Tampere University of Technology, Tampere 33101, Finland
Format: article
Version: accepted version
Access: open
Online Access: PDF Full Text (PDF, 6.2 MB)
Language: English
Published: Institute of Electrical and Electronics Engineers, 2018
Publish Date: 2019-06-04


Automatically generating a summary of a sports video poses the challenge of detecting interesting moments, or highlights, of a game. Traditional sports video summarization methods leverage editing conventions of broadcast sports video that facilitate the extraction of high-level semantics. However, user-generated videos are not edited and, thus, traditional methods are not suitable to generate a summary. In order to solve this problem, this paper proposes a novel video summarization method that uses players' actions as a cue to determine the highlights of the original video. A deep neural-network-based approach is used to extract two types of action-related features and to classify video segments into interesting or uninteresting parts. The proposed method can be applied to any sports in which games consist of a succession of actions. Especially, this paper considers the case of Kendo (Japanese fencing) as an example of a sport to evaluate the proposed method. The method is trained using Kendo videos with ground truth labels that indicate the video highlights. The labels are provided by annotators possessing a different experience with respect to Kendo to demonstrate how the proposed method adapts to different needs. The performance of the proposed method is compared with several combinations of different features, and the results show that it outperforms previous summarization methods.

Series: IEEE transactions on multimedia
ISSN: 1520-9210
ISSN-E: 1941-0077
ISSN-L: 1520-9210
Volume: 20
Issue: 8
Pages: 2000 - 2011
DOI: 10.1109/TMM.2018.2794265
Type of Publication: A1 Journal article – refereed
Field of Science: 113 Computer and information sciences
Funding: This work was supported in part by the Japan Society for the Promotion of Science KAKENHI under Grant 16K16086.
Copyright information: © 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.