University of Oulu

Liu, J., Dai, H.-N., Zhao, G., Li, B., & Zhang, T. (2022). Tmvos: Triplet matching for efficient video object segmentation. Signal Processing: Image Communication, 107, 116779. https://doi.org/10.1016/j.image.2022.116779

TMVOS : triplet matching for efficient video object segmentation

Saved in:
Author: Liu, Jiajia1; Dai, Hong-Ning2; Zhao, Guoying3;
Organizations: 1School of Electronic and Information Engineering, South China University of Technology, Guangzhou, China
2Department of Computing and Decision Sciences, Lingnan University, Hong Kong, China
3Faculty of Information Technology and Electrical Engineering, University of Oulu, Oulu, Finland
Format: article
Version: accepted version
Access: embargoed
Persistent link: http://urn.fi/urn:nbn:fi-fe2023032733239
Language: English
Published: Elsevier, 2022
Publish Date: 2024-06-15
Description:

Abstract

Video object segmentation (VOS) is a critical yet challenging task in video analysis. Recently, many pixel-level matching VOS methods have achieved an outstanding performance without significant time consumption in fine-tuning. However, most of these methods pay little attention to (i) matching background pixels and (ii) optimizing discriminable embeddings between classes. To address these issues, we propose a new end-to-end trainable method, namely Triplet Matching for efficient semi-supervised Video Object Segmentation (TMVOS). In particular, we devise a new triplet matching strategy that considers both the foreground and background matching and pulls the nearest negative embedding further than the nearest positive one for every anchor. As a result, this method implicitly enlarges the distances between embeddings of different classes and thereby generates accurate matching maps. Additionally, a dual decoder is applied for optimizing the final segmentation so that the model better fits the complex background and relatively simple targets. Extensive experiments demonstrate that the proposed method achieves superior performance in terms of accuracy and running-time compared with the state-of-the-art methods. The source code is available at: https://github.com/CVisionProcessing/TMVOS.

see all

Series: Signal processing. Image communication
ISSN: 0923-5965
ISSN-E: 1879-2677
ISSN-L: 0923-5965
Volume: 107
Article number: 116779
DOI: 10.1016/j.image.2022.116779
OADOI: https://oadoi.org/10.1016/j.image.2022.116779
Type of Publication: A1 Journal article – refereed
Field of Science: 113 Computer and information sciences
Subjects:
Funding: This research was partially supported by the National Natural Science Foundation of China (11627802, 11872185 and 12172134), by the State Scholarship Fund of China Scholarship Council (201806155 022), by the State Key Lab of Subtropical Building Science, South China University of Technology (2018ZB33), and by Macao Science and Technology Development Fund under Macao Funding Scheme for Key R & D Projects (0025/2019/AKP).
Copyright information: © 2022. This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/by-nc-nd/4.0/
  https://creativecommons.org/licenses/by-nc-nd/4.0/