University of Oulu

Y. Guo, Y. Liu, M. H. T. De Boer, L. Liu and M. S. Lew, "A Dual Prediction Network for Image Captioning," 2018 IEEE International Conference on Multimedia and Expo (ICME), San Diego, CA, 2018, pp. 1-6. doi: 10.1109/ICME.2018.8486491

A dual prediction network for image captioning

Saved in:
Author: Guo, Yanming1; Liu, Yu2; de Boer, Maaike H.T.3;
Organizations: 1College of System Engineering, National University of Defense Technology, Changsha, China
2LIACS Media Lab, Leiden University, Leiden, the Netherlands
3TNO, Anna van Buerenplein 1, the Hague, the Netherlands
4CMVS, University of Oulu, Oulu, Finland
Format: article
Version: accepted version
Access: open
Online Access: PDF Full Text (PDF, 0.6 MB)
Persistent link: http://urn.fi/urn:nbn:fi-fe2019091027599
Language: English
Published: Institute of Electrical and Electronics Engineers, 2018
Publish Date: 2019-09-10
Description:

Abstract

General captioning practice involves a single forward prediction, with the aim of predicting the word in the next timestep given the word in the current timestep. In this paper, we present a novel captioning framework, namely Dual Prediction Network (DPN), which is end-to-end trainable and addresses the captioning problem with dual predictions. Specifically, the dual predictions consist of a forward prediction to generate the next word from the current input word, as well as a backward prediction to reconstruct the input word using the predicted word. DPN has two appealing properties: 1) By introducing an extra supervision signal on the prediction, DPN can better capture the interplay between the input and the target; 2) Utilizing the reconstructed input, DPN can make another new prediction. During the test phase, we average both predictions to formulate the final target sentence. Experimental results on the MS COCO dataset demonstrate that, benefiting from the reconstruction step, both generated predictions in DPN outperform the predictions of methods based on the general captioning practice (single forward prediction), and averaging them can bring a further accuracy boost. Overall, DPN achieves competitive results with state-of-the-art approaches, across multiple evaluation metrics.

see all

Series: IEEE International Conference on Multimedia and Expo
ISSN: 1945-7871
ISSN-E: 1945-788X
ISSN-L: 1945-7871
ISBN: 978-1-5386-1737-3
ISBN Print: 978-1-5386-1738-0
Pages: 1 - 6
DOI: 10.1109/ICME.2018.8486491
OADOI: https://oadoi.org/10.1109/ICME.2018.8486491
Host publication: 2018 IEEE International Conference on Multimedia and Expo, ICME 2018
Conference: IEEE International Conference on Multimedia and Expo
Type of Publication: A4 Article in conference proceedings
Field of Science: 113 Computer and information sciences
213 Electronic, automation and communications engineering, electronics
Subjects:
Copyright information: © 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.