University of Oulu

Otani M., Nakashima Y., Rahtu E., Heikkilä J., Yokoya N. (2017) Video Summarization Using Deep Semantic Features. In: Lai SH., Lepetit V., Nishino K., Sato Y. (eds) Computer Vision – ACCV 2016. ACCV 2016. Lecture Notes in Computer Science, vol 10115. Springer, Cham

Video summarization using deep semantic features

Saved in:
Author: Otani, Mayu1; Nakashima, Yuta1; Rahtu, Esa2;
Organizations: 1Graduate School of Information Science, Nara Institute of Science and Technology, Ikoma, Japan
2Center for Machine Vision and Signal Analysis, University of Oulu, Oulu, Finland
Format: article
Version: accepted version
Access: open
Online Access: PDF Full Text (PDF, 6.3 MB)
Persistent link:
Language: English
Published: Springer Nature, 2017
Publish Date: 2019-06-03


This paper presents a video summarization technique for an Internet video to provide a quick way to overview its content. This is a challenging problem because finding important or informative parts of the original video requires to understand its content. Furthermore the content of Internet videos is very diverse, ranging from home videos to documentaries, which makes video summarization much more tough as prior knowledge is almost not available. To tackle this problem, we propose to use deep video features that can encode various levels of content semantics, including objects, actions, and scenes, improving the efficiency of standard video summarization techniques. For this, we design a deep neural network that maps videos as well as descriptions to a common semantic space and jointly trained it with associated pairs of videos and descriptions. To generate a video summary, we extract the deep features from each segment of the original video and apply a clustering-based summarization technique to them. We evaluate our video summaries using the SumMe dataset as well as baseline approaches. The results demonstrated the advantages of incorporating our deep semantic features in a video summarization technique.

see all

Series: Lecture notes in computer science
ISSN: 0302-9743
ISSN-E: 1611-3349
ISSN-L: 0302-9743
ISBN: 978-3-319-54193-8
ISBN Print: 978-3-319-54192-1
Pages: 361 - 377
DOI: 10.1007/978-3-319-54193-8_23
Host publication: Computer Vision – ACCV 2016 Workshops : ACCV 2016 International Workshops, Taipei, Taiwan, November 20-24, 2016, Revised Selected Papers, Part V
Host publication editor: Lai, Shang-Hong
Lepetit, Vincent
Nishino, Ko
Sato, Yoichi
Conference: Asian Conference on Computer Vision
Type of Publication: A4 Article in conference proceedings
Field of Science: 113 Computer and information sciences
213 Electronic, automation and communications engineering, electronics
Funding: This work is partly supported by JSPS KAKENHI No. 16K16086.
Copyright information: © Springer International Publishing AG 2017. This is a post-peer-review, pre-copyedit version of an article published in ACCV 2016: Computer Vision – ACCV 2016. The final authenticated version is available online at: