Abstract:The current trademark sub-card processing method is to first carry out text detection, then conduct area classification, and finally split and combine different areas to form a trademark sub-card. This step-by-step processing takes a long time, and the accuracy of the final results will decrease due to the superposition of errors. Therefore, this study proposes a multi-task network model TextCls, which can improve the inference speed and accuracy of the detection and classification modules. TextCls consists of a feature extraction network and two task branches of text detection and regional classification. The text detection branch uses the segmentation network to learn the pixel classification map and then employs pixel aggregation to obtain the text boxes. The pixel classification map is mainly used to learn the information of text and background pixels. The regional classification branch subdivides regional features into Chinese, English, and graphics, focusing on learning the characteristics of different types of regions. Through the shared feature extraction network, the two branches continuously learn pixel information and regional features, and finally the precision of the two tasks is improved. To make up for the lack of text detection datasets for trademark images and verify the effectiveness of TextCls, this study collects and labels a text detection dataset trademark_text (https://github.com/kongbailongtian/trademark_text), which consists of 2000 trademark images. The results show that compared with the optimal text detection algorithm, the text detection branch of TextCls increases the accuracy rate from 94.44% to 95.16%, with the harmonic mean F1 score reaching 92.12%; the F1 score of the regional classification branch also increases from 97.09% to 98.18%.