计算机系统应用  2020, Vol. 29 Issue (10): 248-254 PDF

Application of Text Detection and Recognition in Fine-Grained Image Classification
JIANG Qian, LIU Man
Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
Abstract: Fine-grained image classification is an important branch in the field of deep learning image classification. Since many different classified images are very similar in their features, and there is no particularly distinctive feature can be used to distinguish among them, it makes the classification task of fine-grained image more difficult than that of the general image. Therefore, a traditional image classification method needs to be optimized. Usually, visual and pixel-level features extraction is used in the training of the general image classification. However, direct application of this method to the fine-grained classification is not very suitable, and the effect still needs to be improved, while non-pixel-level features can be used to distinguish. Hence, we propose to combine text and visual information in the image classification, make full use of the features on the images, combine the text detection and recognition algorithms with general image classification methods, and apply it to the fine-grained image classification. In Con-text dataset, the experimental result shows that the accuracy obtained by the proposed algorithm has been significantly improved.
Key words: text detect     text recognition     OCR     image classification     fine-grained image classification

1 概述

 图 1 算法流程图

2 相关研究

2.1 基于深度学习的文本检测方法

2.2 基于深度学习的文本识别方法

2.3 基于深度学习的图片分类方法

2.4 常见的细粒度图片分类方法和应用

3 结合文本识别与图片分类的细粒度图片分类算法

3.1 文本检测

 图 2 EAST结构图

 ${l'_w} = 0.7*{l_w}$ (1)

 ${l'_w} = 0.9*{l_w}$ (2)

3.2 文本识别

 图 3 CRNN结构图

3.3 图片分类

3.4 文本分类

1) 分析和理解数据. 分类之前要对不同建筑分类中出现的单词进行统计, 找到能够代表该类建筑物的关键性词语, 即总结出每一类的主要关键词.

2) 改善识别后词语的分类逻辑. 除了完全匹配外, 根据实验结果分析得到, 认定只要识别得到的字符按顺序匹配, 能达到关键字的50%就判定关键词对应的分类即为该词的分类.

3) 若一张图中有多处文字从而得到多个分类结果, 取出现次数最多的分类, 若出现的次数相同, 取匹配占比最高的关键字对应的分类.

4 实验结果和分析 4.1 数据集

4.2 参数设置

4.3 性能指标

 $P(k) = TP/(TP + FP)$ (3)
 $R(k) = TP/(TP + FN)$ (4)
 $AP = \sum\limits_{i = 1}^n {P(i)*(R(i) - R(i - 1))}$ (5)

4.4 实验结果分析

5 结论

 [1] Deng J, Dong W, Socher R, et al. Imagenet: A large-scale hierarchical image database. Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, FL, USA. 2009. 248–255. [2] Karaoglu S, van Gemert JC, Gevers T. Con-text: Text detection using background connectivity for fine-grained object classification. Proceedings of the 21st ACM International Conference on Multimedia. Barcelona, Spain. 2013. 757–760. [3] Zhou XY, Yao C, Wen H, et al. EAST: An efficient and accurate scene text detector. Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA. 2017. 2642–2651. [4] Shi BG, Bai X, Yao C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(11): 2298-2304. DOI:10.1109/TPAMI.2016.2646371 [5] Graves A, Fernández S, Gomez F, et al. Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. Proceedings of the 23rd International Conference on Machine Learning. Pittsburgh, PA, USA. 2006. 369–376. [6] Jiang YY, Zhu XY, Wang XB, et al. R2CNN: Rotational region CNN for orientation robust scene text detection. arXiv: 1706.09579, 2017. [7] Tian Z, Huang WL, He T, et al. Detecting text in natural image with connectionist text proposal network. Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands. 2016. 56–72. [8] Liao MH, Zhu Z, Shi BG, et al. Rotation-sensitive regression for oriented scene text detection. Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA. 2018. 5909–5918. [9] Yang QP, Cheng ML, Zhou WM, et al. Inceptext: A new inception-text module with deformable psroi pooling for multi-oriented scene text detection. Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm, Sweden. 2018. 1071–1077. [10] Zhang CQ, Liang BR, Huang ZM, et al. Look more than once: An accurate detector for text of arbitrary shapes. arXiv: 1904.06535, 2019. [11] Shi BG, Bai X, Belongie S. Detecting oriented text in natural images by linking segments. Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA. 2017. 3482–3490. [12] Liao MH, Shi BG, Bai X, et al. Textboxes: A fast text detector with a single deep neural network. Proceedings of the 31 AAAI Conference on Artificial Intelligence. San Francisco, CA, USA. 2017. 4161–4167. [13] Liao MH, Shi BG, Bai X. TextBoxes++: A single-shot oriented scene text detector. IEEE Transactions on Image Processing, 2018, 27(8): 3676-3690. DOI:10.1109/TIP.2018.2825107 [14] Liu YL, Jin LW. Deep matching prior network: Toward tighter multi-oriented text detection. Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA. 2017. 3454–3461. [15] Li X, Wang WH, Hou WB, et al. Shape robust text detection with progressive scale expansion network. arXiv: 1806.02559, 2018. [16] Baek Y, Lee B, Han D, et al. Character region awareness for text detection. arXiv: 1904.01941, 2019. [17] He KM, Gkioxari G, Dollár P, et al. Mask R-CNN. Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy. 2017. 2980–2988. [18] Luong MT, Pham H, Manning CD. Effective approaches to attention-based neural machine translation. arXiv: 1508.04025, 2015. [19] LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11): 2278-2324. DOI:10.1109/5.726791 [20] Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25. Lake Tahoe, NV, USA. 2012. 1106–1114. [21] Szegedy C, Liu W, Jia YQ, et al. Going deeper with convolutions. Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA. 2015. 1–9. [22] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv: 1409.1556, 2014. [23] He KM, Zhang XY, Ren SQ, et al. Deep residual learning for image recognition. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA. 2016. 770–778. [24] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA. 2015. 3431–3440. [25] Neubeck A, Van Gool L. Efficient non-maximum suppression. Proceedings of the 18th International Conference on Pattern Recognition. Hong Kong, China. 2006. 850–855. [26] Hong S, Roh B, Kim KH, et al. PVANet: Lightweight deep neural networks for real-time object detection. arXiv: 1611.08588, 2016. [27] Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich, Germany. 2015. 234–241. [28] Shi XJ, Chen ZR, Wang H, et al. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, QC, Canada. 2015. 802–810.