本文已被:浏览 530次 下载 1522次
Received:February 09, 2023 Revised:March 14, 2023
Received:February 09, 2023 Revised:March 14, 2023
中文摘要: 目前商标分卡处理方法是先进行文本检测再进行区域分类, 最后对不同的区域进行拆分组合形成商标分卡. 这种分步式的处理耗时长, 并且因为误差的叠加会导致最终结果准确率下降. 针对这一问题, 本文提出了多任务的网络模型TextCls, 通过设计多任务学习模型来提升商标分卡的检测和分类模块的推理速度和精确率. 该模型包含一个特征提取网络, 以及文本检测和区域分类两个任务分支. 其中, 文本检测分支采用分割网络学习像素分类图, 然后使用像素聚合获得文本框, 像素分类图主要是学习文本像素和背景像素的信息; 区域分类分支对区域特征细分为中文、英文和图形, 着重学习不同类型区域的特征. 两个分支通过共享特征提取网络, 像素信息和区域特征相互促进学习, 最终两个任务的精确率得以提升. 为了弥补商标图像的文本检测数据集的缺失以及验证TextCls的有效性, 本文还收集并标注了一个由2000张商标图像构成的文本检测数据集trademark_text (https://github.com/kongbailongtian/trademark_text), 结果表明: 与最佳的文本检测算法相比, 本文的文本检测分支将精确率由94.44%提升至95.16%, 调和平均值F1 score达92.12%; 区域分类分支的F1 score也由97.09%提升至98.18%.
中文关键词: 商标分卡|端到端|文本检测|多任务学习|数据集
Abstract:The current trademark sub-card processing method is to first carry out text detection, then conduct area classification, and finally split and combine different areas to form a trademark sub-card. This step-by-step processing takes a long time, and the accuracy of the final results will decrease due to the superposition of errors. Therefore, this study proposes a multi-task network model TextCls, which can improve the inference speed and accuracy of the detection and classification modules. TextCls consists of a feature extraction network and two task branches of text detection and regional classification. The text detection branch uses the segmentation network to learn the pixel classification map and then employs pixel aggregation to obtain the text boxes. The pixel classification map is mainly used to learn the information of text and background pixels. The regional classification branch subdivides regional features into Chinese, English, and graphics, focusing on learning the characteristics of different types of regions. Through the shared feature extraction network, the two branches continuously learn pixel information and regional features, and finally the precision of the two tasks is improved. To make up for the lack of text detection datasets for trademark images and verify the effectiveness of TextCls, this study collects and labels a text detection dataset trademark_text (https://github.com/kongbailongtian/trademark_text), which consists of 2000 trademark images. The results show that compared with the optimal text detection algorithm, the text detection branch of TextCls increases the accuracy rate from 94.44% to 95.16%, with the harmonic mean F1 score reaching 92.12%; the F1 score of the regional classification branch also increases from 97.09% to 98.18%.
文章编号: 中图分类号: 文献标志码:
基金项目:广东省基础与应用基础研究基金区域联合基金青年基金(2021A1515110673)
引用文本:
张贞䶮,苏海,余松森.基于端到端的多任务商标分卡模型.计算机系统应用,2023,32(8):105-115
ZHANG Zhen-Yan,SU Hai,YU Song-Sen.End-to-end Multi-task Trademark Sub-card Model.COMPUTER SYSTEMS APPLICATIONS,2023,32(8):105-115
张贞䶮,苏海,余松森.基于端到端的多任务商标分卡模型.计算机系统应用,2023,32(8):105-115
ZHANG Zhen-Yan,SU Hai,YU Song-Sen.End-to-end Multi-task Trademark Sub-card Model.COMPUTER SYSTEMS APPLICATIONS,2023,32(8):105-115