改进CLIP-ReID的跨模态行人重识别

doi:10.15888/j.cnki.csa.009741

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月6日 16:28 星期日

首页 > 过刊浏览>2025年第34卷第1期 >153-160. DOI:10.15888/j.cnki.csa.009741

PDF HTML阅读 XML下载导出引用引用提醒

改进CLIP-ReID的跨模态行人重识别
DOI:
                        10.15888/j.cnki.csa.009741
                    
CSTR:
                        32024.14.csa.009741
                    
作者:
                        贾军营贾军营
沈阳工业大学 信息科学与工程学院, 沈阳 110870
在期刊界中查找
在百度中查找
在本站中查找
杨芯茹杨芯茹
沈阳工业大学 信息科学与工程学院, 沈阳 110870
在期刊界中查找
在百度中查找
在本站中查找
杨海波杨海波
沈阳工业大学 信息科学与工程学院, 沈阳 110870
在期刊界中查找
在百度中查找
在本站中查找
徐展徐展
沈阳工业大学 信息科学与工程学院, 沈阳 110870
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:辽宁省应用基础研究项目(2022JH2/101300243); 2022年度沈阳市科学技术计划“揭榜挂帅”产业共性技术项目(22-316-1-07)

Cross-modal Person Re-identification Based on Improved CLIP-ReID

Author:

JIA Jun-Ying
JIA Jun-Ying
School of Information Science and Engineering, Shenyang University of Technology, Shenyang 110870, China
在期刊界中查找
在百度中查找
在本站中查找
YANG Xin-Ru
YANG Xin-Ru
School of Information Science and Engineering, Shenyang University of Technology, Shenyang 110870, China
在期刊界中查找
在百度中查找
在本站中查找
YANG Hai-Bo
YANG Hai-Bo
School of Information Science and Engineering, Shenyang University of Technology, Shenyang 110870, China
在期刊界中查找
在百度中查找
在本站中查找
XU Zhan
XU Zhan
School of Information Science and Engineering, Shenyang University of Technology, Shenyang 110870, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

由图像到文本的跨模态行人重识别中缩小模态间差异一直是一个主要挑战, 针对该问题, 研究了一种基于CLIP-ReID (contrastive language-image pretraining-person re-identification)的改进方法. 引入了上下文调整网络模块和跨模态注意力机制模块. 上下文调整网络模块对图像特征进行深层次的非线性转换, 并有效地与可学习上下文向量相结合, 增强图像和文本间的语义关联性. 跨模态注意力机制模块通过对图像和文本特征进行动态加权和融合, 使得模型能够在处理一个模态的信息时考虑到另一模态, 提升模型在不同模态间的交互. 该方法分别在MSMT17、Market1501、DukeMTMC公共数据集上进行了评估, 实验结果在mAP值上分别提升了2.2%、0.5%、0.4%; 在R1值上分别提升了1.1%、0.1%、1.2%. 结果表明所提方法有效地提升了行人重识别的精度.

关键词:行人重识别;跨模态;注意力机制

Abstract:

Narrowing the difference between modalities is always challenging in cross-modal person re-identification from images to texts. To address this challenge, this study proposes an improved method based on contrastive language-image pretraining-person re-identification (CLIP-ReID) by integrating a context adjustment network module and a cross-modal attention mechanism module. The former module performs a deep nonlinear transformation on image features and effectively combines with learnable context vectors to enhance the semantic relevance between images and texts. The latter module dynamically weights and fuses features from images and texts so that the model can take into account the other modality when processing the information of one modality, improving the interaction between different modalities. The method is evaluated on three public datasets. Experimental results show that the mAP on the MSMT17 dataset is increased by 2.2% and R1 is increased by 1.1%. On the Market1501 dataset, there is a 0.5% increase in mAP and a 0.1% rise in R1. The DukeMTMC dataset sees a 0.4% enhancement in mAP and a 1.2% increase in R1. The results show that the proposed method effectively improves the accuracy of person re-identification.

Key words:person re-identification;cross-modal;attention mechanism

引用本文

贾军营,杨芯茹,杨海波,徐展.改进CLIP-ReID的跨模态行人重识别.计算机系统应用,2025,34(1):153-160

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2024-06-24
最后修改日期:2024-07-18
录用日期:
在线发布日期: 2024-11-28
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码