基于端到端的多任务商标分卡模型

doi:10.15888/j.cnki.csa.009210

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年7月25日 10:50 星期五

首页 > 过刊浏览>2023年第32卷第8期 >105-115. DOI:10.15888/j.cnki.csa.009210

PDF HTML阅读 XML下载导出引用引用提醒

基于端到端的多任务商标分卡模型
DOI:
                        10.15888/j.cnki.csa.009210
                    
CSTR:
                        
                    
作者:
                        张贞䶮张贞䶮
华南师范大学 软件学院, 佛山 528225
在期刊界中查找
在百度中查找
在本站中查找
苏海苏海
华南师范大学 软件学院, 佛山 528225
在期刊界中查找
在百度中查找
在本站中查找
余松森余松森
华南师范大学 软件学院, 佛山 528225
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:广东省基础与应用基础研究基金区域联合基金青年基金(2021A1515110673)

End-to-end Multi-task Trademark Sub-card Model

Author:

ZHANG Zhen-Yan
ZHANG Zhen-Yan
School of Software, South China Normal University, Foshan 528225, China
在期刊界中查找
在百度中查找
在本站中查找
SU Hai
SU Hai
School of Software, South China Normal University, Foshan 528225, China
在期刊界中查找
在百度中查找
在本站中查找
YU Song-Sen
YU Song-Sen
School of Software, South China Normal University, Foshan 528225, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

目前商标分卡处理方法是先进行文本检测再进行区域分类, 最后对不同的区域进行拆分组合形成商标分卡. 这种分步式的处理耗时长, 并且因为误差的叠加会导致最终结果准确率下降. 针对这一问题, 本文提出了多任务的网络模型TextCls, 通过设计多任务学习模型来提升商标分卡的检测和分类模块的推理速度和精确率. 该模型包含一个特征提取网络, 以及文本检测和区域分类两个任务分支. 其中, 文本检测分支采用分割网络学习像素分类图, 然后使用像素聚合获得文本框, 像素分类图主要是学习文本像素和背景像素的信息; 区域分类分支对区域特征细分为中文、英文和图形, 着重学习不同类型区域的特征. 两个分支通过共享特征提取网络, 像素信息和区域特征相互促进学习, 最终两个任务的精确率得以提升. 为了弥补商标图像的文本检测数据集的缺失以及验证TextCls的有效性, 本文还收集并标注了一个由2000张商标图像构成的文本检测数据集trademark_text (https://github.com/kongbailongtian/trademark_text), 结果表明: 与最佳的文本检测算法相比, 本文的文本检测分支将精确率由94.44%提升至95.16%, 调和平均值F1 score达92.12%; 区域分类分支的F1 score也由97.09%提升至98.18%.

关键词:商标分卡|端到端|文本检测|多任务学习|数据集

Abstract:

The current trademark sub-card processing method is to first carry out text detection, then conduct area classification, and finally split and combine different areas to form a trademark sub-card. This step-by-step processing takes a long time, and the accuracy of the final results will decrease due to the superposition of errors. Therefore, this study proposes a multi-task network model TextCls, which can improve the inference speed and accuracy of the detection and classification modules. TextCls consists of a feature extraction network and two task branches of text detection and regional classification. The text detection branch uses the segmentation network to learn the pixel classification map and then employs pixel aggregation to obtain the text boxes. The pixel classification map is mainly used to learn the information of text and background pixels. The regional classification branch subdivides regional features into Chinese, English, and graphics, focusing on learning the characteristics of different types of regions. Through the shared feature extraction network, the two branches continuously learn pixel information and regional features, and finally the precision of the two tasks is improved. To make up for the lack of text detection datasets for trademark images and verify the effectiveness of TextCls, this study collects and labels a text detection dataset trademark_text (https://github.com/kongbailongtian/trademark_text), which consists of 2000 trademark images. The results show that compared with the optimal text detection algorithm, the text detection branch of TextCls increases the accuracy rate from 94.44% to 95.16%, with the harmonic mean F1 score reaching 92.12%; the F1 score of the regional classification branch also increases from 97.09% to 98.18%.

Key words:trademark sub-card|end-to-end|text detection|multi-task learning|datasets

引用本文

张贞&#;,苏海,余松森.基于端到端的多任务商标分卡模型.计算机系统应用,2023,32(8):105-115

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2023-02-09
最后修改日期:2023-03-14
录用日期:
在线发布日期: 2023-06-09
出版日期:

微信公众号

网站二维码

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码