Lifelong fine-grained image retrieval
|Author:||Chen, Wei1; Xu, Haoyang2; Pu, Nan3;|
1Academy of Advanced Technology Research of Hunan, Changsha, China
2College of Communication Engineering, Xidian University, China
3Leiden Institute of Advanced Computer Science, Leiden University, The Netherlands
4DUT-RU International School of Information Science and Engineering, Dalian University of Technology, China
5Center for Machine Vision and Signal Analysis, University of Oulu, Finland
|Online Access:||PDF Full Text (PDF, 10.6 MB)|
|Persistent link:|| http://urn.fi/urn:nbn:fi-fe20231004138664
Institute of Electrical and Electronics Engineers,
|Publish Date:|| 2023-10-04
Fine-grained image retrieval has been extensively explored in a zero-shot manner. A deep model is trained on the seen part and then evaluated the generalization performance on the unseen part. However, this setting is infeasible for many real-world applications since (1) the retrieval dataset can be non-fixed so that new data are added constantly, and (2) data samples of the seen categories are also common in practice and are important for evaluation. In this paper, we explore lifelong fine-grained image retrieval (LFGIR), which learns continuously on a sequence of new tasks with data from different datasets. We first use knowledge distillation to minimize catastrophic forgetting on old tasks. Training continuously on different datasets causes large domain shifts between the old and new tasks while image retrieval is sensitive to even small shifts in the features. This tends to weaken the effectiveness of knowledge distillation by the frozen teacher. To mitigate the impact of domain shifts, we use the network inversion method to generate images of the old tasks. In addition, we design an on-the-fly teacher which transfers knowledge captured on a new task to the student to improve better generalization performance, thereby achieving a better balance between old and new tasks in the end. We name the whole framework as Dual Knowledge Distillation (DKD), whose efficacy is demonstrated by extensive experimental results on sequential tasks including 7 datasets.
IEEE transactions on multimedia
|Type of Publication:||
A1 Journal article – refereed
|Field of Science:||
113 Computer and information sciences
This work was supported by LIACS MediaLab at Leiden University, China Scholarship Council (CSC No. 201703170183), National Key Research and Development Program of China No. 2021YFB3100800, the Academy of Finland under grant 331883, Infotech Project FRAGES, National Natural Science Foundation of China under Grant 62102061, and the Fundamental Research Funds for the Central Universities under Grant DUT21RC(3)024. We would like to thank NVIDIA for the donation of GPU cards.
|Academy of Finland Grant Number:||
331883 (Academy of Finland Funding decision)
© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.