增值税发票信息结构化识别
作者:

Structural Information Recognition of VAT Invoice
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [69]
  • |
  • 相似文献 [7]
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    为进一步简化增值税发票识别流程和和提高识别效率, 提出了一种基于HRNet和YOLOv4的增值税票面信息结构化识别的方法. 首先利用HRNet进行增值税发票关键点检测, 进行增值税发票对齐; 其次利用YOLOv4进行发票元素的检测; 然后通过CRNN对发票元素进行文本识别; 最后形成结构化数据. 在业务数据集中的实验表明, 检测准确率在0.5 mAP下达到75.7, 检测速度达到12.85 fps, 元素识别率ECR达到69.30%, 实验结果表明算法能有效简化识别流程, 提高识别准确率, 在实时性要求较高和业务噪声复杂的增值税票据识别中有较好适应性和广泛应用前景.

    Abstract:

    To simplify the processing steps of VAT invoices and improve recognition accuracy, we propose a method based on HRNet and YOLOv4 to extract structural information of VAT invoices. Firstly, we detect predefined keypoints in the VAT invoice with the HRNet method to align the invoice to a standard template. Then detect the structural information cell in the invoice by YOLOv4. And lastly use CRNN to recognize the cell block image to obtain structural data. The experimental results on real business VAT invoices show that the proposed method gets a detection accuracy of 75.7 at 0.5 mAP, reaches a detection speed at 12.85 fps, and achieves an Element Correct Ratio (ECR) at 69.30%. The results indicate that the proposed method can simplify the process and improve the accuracy of recognition, and it can apply to the scene where requires high real-time performance and needs to deal with complicated noise situation.

    参考文献
    [1] 谢志钢. 面向增值税发票的图像自动处理技术研究[硕士学位论文]. 上海: 上海交通大学, 2015.
    [2] Yin Y, Wang Y, Jiang Y, et al. The image preprocessing and check of amount for VAT invoices. In: Liang QL, Liu X, Na ZY, et al. eds. Communications, Signal Processing, and Systems. Singapore: Springer, 2019. 44–51.
    [3] 胡泽枫. 基于OCR的批量发票识别系统研究与实现[硕士学位论文]. 广州: 广东工业大学, 2019.
    [4] 胡泽枫, 张学习, 黎贤钊. 基于卷积神经网络的批量发票识别系统研究. 工业控制计算机, 2019, 32(5): 104–105, 107. [doi: 10.3969/j.issn.1001-182X.2019.05.043
    [5] 蒋璎. 基于深度学习的发票识别系统[硕士学位论文]. 南京: 南京邮电大学, 2019.
    [6] 刘欢. 基于深度学习的发票图像文本检测与识别[硕士学位论文]. 武汉: 华中科技大学, 2019.
    [7] 黄志文. 基于深度学习的发票自动识别系统的设计与实现[硕士学位论文]. 广州: 广东工业大学, 2018.
    [8] 潘妍. 票据结构化识别方法研究[硕士学位论文]. 杭州: 浙江大学, 2020.
    [9] 刘峰. 一种改进的自适应增值税发票字符识别方法研究[硕士学位论文]. 湘潭: 湘潭大学, 2014.
    [10] 武军亮. 增值税发票中有效信息的识别算法研究与实现[硕士学位论文]. 青岛: 青岛科技大学, 2018.
    [11] 廖玉钦. 增值税发票自动识别算法研究[硕士学位论文]. 大连: 大连海事大学, 2018.
    [12] 冯炳. 智能识别技术在企业信息化系统中的应用. 电子技术与软件工程, 2019, (3): 60
    [13] 蒋冲宇, 鲁统伟, 闵峰, 等. 基于神经网络的发票文字检测与识别方法. 武汉工程大学学报, 2019, 41(6): 586–590
    [14] Zhou XY, Yao C, Wen H, et al. East: An efficient and accurate scene text detector. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu: IEEE, 2017. 2642–2651.
    [15] Tian Z, Huang WL, He T, et al. Detecting text in natural image with connectionist text proposal network. European Conference on Computer Vision. Amsterdam: Springer, 2016. 56–72.
    [16] Ma JQ, Shao WY, Ye H, et al. Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia, 2018, 20(11): 3111–3122. [doi: 10.1109/TMM.2018.2818020
    [17] Li X, Wang WH, Hou WB, et al. Shape robust text detection with progressive scale expansion network. arXiv: 1806.02559, 2018.
    [18] Liao MH, Wan ZY, Yao C, et al. Real-time scene text detection with differentiable binarization. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 11474–11481. [doi: 10.1609/aaai.v34i07.6812
    [19] Wang PF, Zhang CQ, Qi F, et al. A single-shot arbitrarily-shaped text detector based on context attended multi-task learning. Proceedings of the 27th ACM International Conference on Multimedia. Nice: ACM, 2019. 1277–1285.
    [20] Shi BG, Bai X, Yao C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(11): 2298–2304. [doi: 10.1109/TPAMI.2016.2646371
    [21] Liu W, Chen CF, Wong KYK, et al. STAR-Net: A SpaTial attention residue network for scene text recognition. Proceedings of the British Machine Vision Conference. York: BMVC, 2016.
    [22] Shi BG, Wang XG, Lv PY, et al. Robust scene text recognition with automatic rectification. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas: IEEE, 2016. 4168–4176.
    [23] Yu DL, Li X, Zhang CQ, et al. Towards accurate scene text recognition with semantic reasoning networks. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle: IEEE, 2020. 12110–12119.
    [24] Borisyuk F, Gordo A, Sivakumar V. Rosetta: Large scale system for text detection and recognition in images. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. London: ACM, 2018. 71–79.
    [25] Li H, Wang P, Shen CH. Towards end-to-end text spotting with convolutional recurrent neural networks. 2017 IEEE International Conference on Computer Vision (ICCV). Venice: IEEE, 2017. 5248–5256.
    [26] Liu XB, Liang D, Yan S, et al. FOTS: Fast oriented text spotting with a unified network. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 5676–5685.
    [27] Bartz C, Yang H, Meinel C. STN-OCR: A single neural network for text detection and text recognition. Computer Vision & Pattern Recognition. IEEE, 2017.
    [28] Liao MH, Shi BG, Bai X. TextBoxes++: A single-shot oriented scene text detector. IEEE Transactions on Image Processing, 2018, 27(8): 3676–3690. [doi: 10.1109/TIP.2018.2825107
    [29] Janssen B, Saund E, Bier E, et al. Receipts2Go: The big world of small documents. Proceedings of the 2012 ACM Symposium on Document Engineering. Limerick: ACM, 2012. 121–124.
    [30] Zhang P, Xu YL, Cheng ZZ, et al. TRIE: End-to-end text reading and information extraction for document understanding. arXiv: 2005.13118, 2020.
    [31] Tuganbaev D, Pakhchanian A, Deryagin D. Universal data capture technology from semi-structured forms. Eighth International Conference on Document Analysis and Recognition (ICDAR’05). Seoul: IEEE, 2005. 458–462.
    [32] Aslan E, Karakaya T, Unver E, et al. An optimization approach for invoice image analysis. 2015 23nd Signal Processing and Communications Applications Conference (SIU). Malatya: IEEE, 2015. 1130–1133.
    [33] Aslan E, Karakaya T, Unver E, et al. A part based modeling approach for invoice parsing. Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. Rome: SciTePress, 2016. 390–397.
    [34] Minagawa A, Fujii Y, Takebe H, et al. Logical structure analysis for form images with arbitrary layout by belief propagation. Ninth International Conference on Document Analysis and Recognition (ICDAR 2007). Curitiba: IEEE, 2007. 714–718.
    [35] Ha HT, Medved’ M, Nevěřilová Z, et al. Recognition of ocr invoice metadata block types. Proceedings of the 21st International Conference on Text, Speech, and Dialogue. Brno: Springer, 2018. 304–312.
    [36] Nguyen MT, Phan VA, Linh LT, et al. Transfer learning for information extraction with limited data. Proceedings of the 16th International Conference of the Pacific Association for Computational Linguistics. Hanoi: Springer, 2019. 469–482.
    [37] Huang Z, Chen K, He JH, et al. ICDAR2019 competition on scanned receipt OCR and information extraction. 2019 International Conference on Document Analysis and Recognition (ICDAR). Sydney: IEEE, 2019. 1516–1520.
    [38] Zhao XH, Niu ED, Wu Z, et al. CUTIE: Learning to understand documents with convolutional universal text information extractor. arXiv: 1903.12363, 2019.
    [39] Santosh KC. g-DICE: Graph mining-based document information content exploitation. International Journal on Document Analysis and Recognition (IJDAR), 2015, 18(4): 337–355. [doi: 10.1007/s10032-015-0253-z
    [40] Yi F, Zhao YF, Sheng GQ, et al. Dual model medical invoices recognition. Sensors, 2019, 19(20): 4370. [doi: 10.3390/s19204370
    [41] Liu XJ, Gao FY, Zhang Q, et al. Graph convolution for multimodal information extraction from visually rich documents. arXiv: 1903.11279, 2019.
    [42] Sunder V, Srinivasan A, Vig L, et al. One-shot information extraction from document images using neuro-deductive program synthesis. arXiv: 1906.02427, 2019.
    [43] Palm RB, Winther O, Laws F. CloudScan-a configuration-free invoice analysis system using recurrent neural networks. 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). Kyoto: IEEE, 2017. 406–413.
    [44] Bart E, Sarkar P. Information extraction by finding repeated structure. Proceedings of the 9th IAPR International Workshop on Document Analysis Systems. Massachusetts: ACM, 2010. 175–182.
    [45] Schulz F, Ebbecke M, Gillmann M, et al. Seizing the treasure: Transferring knowledge in invoice analysis. 2009 10th International Conference on Document Analysis and Recognition. Barcelona: IEEE, 2009. 848–852.
    [46] Palm RB, Laws F, Winther O. Attend, copy, parse end-to-end information extraction from documents. 2019 International Conference on Document Analysis and Recognition (ICDAR). Sydney: IEEE, 2019. 329–336.
    [47] Chien PH, Lee GC. A template-based method for identifying input regions in survey forms. Pattern Recognition and Image Analysis, 2011, 21(3): 469–472. [doi: 10.1134/S1054661811020210
    [48] Peng HC, Long FH, Chi ZR. Document image recognition based on template matching of component block projections. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003, 25(9): 1188–1192. [doi: 10.1109/TPAMI.2003.1227996
    [49] Sun YY, Mao XF, Hong S, et al. Template matching-based method for intelligent invoice information identification. IEEE Access, 2019, 7: 28392–28401. [doi: 10.1109/ACCESS.2019.2901943
    [50] Tseng LY, Chen RC. Recognition and data extraction of form documents based on three types of line segments. Pattern Recognition, 1998, 31(10): 1525–1540. [doi: 10.1016/S0031-3203(98)00007-7
    [51] Tanaka H, Takebe H, Hotta Y. Robust cell extraction method for form documents based on intersection searching and global optimization. 2011 International Conference on Document Analysis and Recognition. Beijing: IEEE, 2011. 354–358.
    [52] Katti AR, Reisswig C, Guder C, et al. Chargrid: Towards understanding 2D documents. arXiv: 1809.08799, 2018.
    [53] Guo H, Qin XM, Liu JM, et al. EATEN: Entity-aware attention for single shot visual text extraction. 2019 International Conference on Document Analysis and Recognition (ICDAR). Sydney: IEEE, 2019. 254–259.
    [54] Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas: IEEE, 2016. 779–788.
    [55] Redmon J, Farhadi A. YOLO9000: Better, faster, stronger. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu: IEEE, 2017. 6517–6525.
    [56] Redmon J, Farhadi A. YOLOv3: An incremental improvement. arXiv: 1804.02767, 2018.
    [57] Bochkovskiy A, Wang CY, Liao HYM. YOLOv4: Optimal speed and accuracy of object detection. arXiv: 2004.10934, 2020.
    [58] Sun K, Xiao B, Liu D, et al. Deep high-resolution representation learning for human pose estimation. Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE, 2019. 5686–5696.
    [59] Newell A, Yang KY, Deng J. Stacked hourglass networks for human pose estimation. Proceedings of the 14th European Conference on Computer Vision. Amsterdam: Springer, 2016. 483–499.
    [60] Chen YL, Wang ZC, Peng YX, et al. Cascaded pyramid network for multi-person pose estimation. Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 7103–7112.
    [61] Li WB, Wang ZC, Yin BY, et al. Rethinking on multi-stage networks for human pose estimation. arXiv: 1901.00148, 2019.
    [62] Fischler MA, Bolles RC. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. In: Fischler MA, Firschein O, eds. Readings in Computer Vision. Amsterdam: Elsevier, 1987. 726–740.
    [63] Girshick R. Fast R-CNN. 2015 IEEE International Conference on Computer Vision (ICCV). Santiago: IEEE, 2015. 1440–1448.
    [64] Ren SQ, He KM, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149. [doi: 10.1109/TPAMI.2016.2577031
    [65] Liu W, Anguelov D, Erhan D, et al. SSD: Single Shot MultiBox detector. Proceedings of the 14th European Conference on Computer Vision. Amsterdam: Springer, 2016. 21–37.
    [66] Fu CY, Liu W, Ranga A, et al. DSSD: Deconvolutional Single Shot Detector. Computer Vision & Pattern Recognition. IEEE, 2017.
    [67] Du YN, Li CX, Guo RY, et al. PP-OCR: A practical ultra lightweight OCR system. Computer Vision and Pattern Recognition (CVPR). IEEE, 2020.
    [68] Jung A, Wada B, et al. Corvette111/imgaug. https://github.com/aleju/imgaug. (2020-06-01).
    [69] Jocher G, Stoken A, Borovec J, et al. Ultralytics/YOLOV5: V3.0. https://github.com/ultralytics/yolov5. (2020-08-13).
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

唐军,唐潮.增值税发票信息结构化识别.计算机系统应用,2021,30(12):317-325

复制
分享
文章指标
  • 点击次数:905
  • 下载次数: 1944
  • HTML阅读次数: 4053
  • 引用次数: 0
历史
  • 收稿日期:2021-02-10
  • 最后修改日期:2021-03-18
  • 在线发布日期: 2021-12-10
文章二维码
您是第11371949位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号