Structural Information Recognition of VAT Invoice
CSTR:
Author:
  • Article
  • | |
  • Metrics
  • |
  • Reference [69]
  • |
  • Related
  • |
  • Cited by
  • | |
  • Comments
    Abstract:

    To simplify the processing steps of VAT invoices and improve recognition accuracy, we propose a method based on HRNet and YOLOv4 to extract structural information of VAT invoices. Firstly, we detect predefined keypoints in the VAT invoice with the HRNet method to align the invoice to a standard template. Then detect the structural information cell in the invoice by YOLOv4. And lastly use CRNN to recognize the cell block image to obtain structural data. The experimental results on real business VAT invoices show that the proposed method gets a detection accuracy of 75.7 at 0.5 mAP, reaches a detection speed at 12.85 fps, and achieves an Element Correct Ratio (ECR) at 69.30%. The results indicate that the proposed method can simplify the process and improve the accuracy of recognition, and it can apply to the scene where requires high real-time performance and needs to deal with complicated noise situation.

    Reference
    [1] 谢志钢. 面向增值税发票的图像自动处理技术研究[硕士学位论文]. 上海: 上海交通大学, 2015.
    [2] Yin Y, Wang Y, Jiang Y, et al. The image preprocessing and check of amount for VAT invoices. In: Liang QL, Liu X, Na ZY, et al. eds. Communications, Signal Processing, and Systems. Singapore: Springer, 2019. 44–51.
    [3] 胡泽枫. 基于OCR的批量发票识别系统研究与实现[硕士学位论文]. 广州: 广东工业大学, 2019.
    [4] 胡泽枫, 张学习, 黎贤钊. 基于卷积神经网络的批量发票识别系统研究. 工业控制计算机, 2019, 32(5): 104–105, 107. [doi: 10.3969/j.issn.1001-182X.2019.05.043
    [5] 蒋璎. 基于深度学习的发票识别系统[硕士学位论文]. 南京: 南京邮电大学, 2019.
    [6] 刘欢. 基于深度学习的发票图像文本检测与识别[硕士学位论文]. 武汉: 华中科技大学, 2019.
    [7] 黄志文. 基于深度学习的发票自动识别系统的设计与实现[硕士学位论文]. 广州: 广东工业大学, 2018.
    [8] 潘妍. 票据结构化识别方法研究[硕士学位论文]. 杭州: 浙江大学, 2020.
    [9] 刘峰. 一种改进的自适应增值税发票字符识别方法研究[硕士学位论文]. 湘潭: 湘潭大学, 2014.
    [10] 武军亮. 增值税发票中有效信息的识别算法研究与实现[硕士学位论文]. 青岛: 青岛科技大学, 2018.
    [11] 廖玉钦. 增值税发票自动识别算法研究[硕士学位论文]. 大连: 大连海事大学, 2018.
    [12] 冯炳. 智能识别技术在企业信息化系统中的应用. 电子技术与软件工程, 2019, (3): 60
    [13] 蒋冲宇, 鲁统伟, 闵峰, 等. 基于神经网络的发票文字检测与识别方法. 武汉工程大学学报, 2019, 41(6): 586–590
    [14] Zhou XY, Yao C, Wen H, et al. East: An efficient and accurate scene text detector. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu: IEEE, 2017. 2642–2651.
    [15] Tian Z, Huang WL, He T, et al. Detecting text in natural image with connectionist text proposal network. European Conference on Computer Vision. Amsterdam: Springer, 2016. 56–72.
    [16] Ma JQ, Shao WY, Ye H, et al. Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia, 2018, 20(11): 3111–3122. [doi: 10.1109/TMM.2018.2818020
    [17] Li X, Wang WH, Hou WB, et al. Shape robust text detection with progressive scale expansion network. arXiv: 1806.02559, 2018.
    [18] Liao MH, Wan ZY, Yao C, et al. Real-time scene text detection with differentiable binarization. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 11474–11481. [doi: 10.1609/aaai.v34i07.6812
    [19] Wang PF, Zhang CQ, Qi F, et al. A single-shot arbitrarily-shaped text detector based on context attended multi-task learning. Proceedings of the 27th ACM International Conference on Multimedia. Nice: ACM, 2019. 1277–1285.
    [20] Shi BG, Bai X, Yao C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(11): 2298–2304. [doi: 10.1109/TPAMI.2016.2646371
    [21] Liu W, Chen CF, Wong KYK, et al. STAR-Net: A SpaTial attention residue network for scene text recognition. Proceedings of the British Machine Vision Conference. York: BMVC, 2016.
    [22] Shi BG, Wang XG, Lv PY, et al. Robust scene text recognition with automatic rectification. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas: IEEE, 2016. 4168–4176.
    [23] Yu DL, Li X, Zhang CQ, et al. Towards accurate scene text recognition with semantic reasoning networks. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle: IEEE, 2020. 12110–12119.
    [24] Borisyuk F, Gordo A, Sivakumar V. Rosetta: Large scale system for text detection and recognition in images. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. London: ACM, 2018. 71–79.
    [25] Li H, Wang P, Shen CH. Towards end-to-end text spotting with convolutional recurrent neural networks. 2017 IEEE International Conference on Computer Vision (ICCV). Venice: IEEE, 2017. 5248–5256.
    [26] Liu XB, Liang D, Yan S, et al. FOTS: Fast oriented text spotting with a unified network. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 5676–5685.
    [27] Bartz C, Yang H, Meinel C. STN-OCR: A single neural network for text detection and text recognition. Computer Vision & Pattern Recognition. IEEE, 2017.
    [28] Liao MH, Shi BG, Bai X. TextBoxes++: A single-shot oriented scene text detector. IEEE Transactions on Image Processing, 2018, 27(8): 3676–3690. [doi: 10.1109/TIP.2018.2825107
    [29] Janssen B, Saund E, Bier E, et al. Receipts2Go: The big world of small documents. Proceedings of the 2012 ACM Symposium on Document Engineering. Limerick: ACM, 2012. 121–124.
    [30] Zhang P, Xu YL, Cheng ZZ, et al. TRIE: End-to-end text reading and information extraction for document understanding. arXiv: 2005.13118, 2020.
    [31] Tuganbaev D, Pakhchanian A, Deryagin D. Universal data capture technology from semi-structured forms. Eighth International Conference on Document Analysis and Recognition (ICDAR’05). Seoul: IEEE, 2005. 458–462.
    [32] Aslan E, Karakaya T, Unver E, et al. An optimization approach for invoice image analysis. 2015 23nd Signal Processing and Communications Applications Conference (SIU). Malatya: IEEE, 2015. 1130–1133.
    [33] Aslan E, Karakaya T, Unver E, et al. A part based modeling approach for invoice parsing. Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. Rome: SciTePress, 2016. 390–397.
    [34] Minagawa A, Fujii Y, Takebe H, et al. Logical structure analysis for form images with arbitrary layout by belief propagation. Ninth International Conference on Document Analysis and Recognition (ICDAR 2007). Curitiba: IEEE, 2007. 714–718.
    [35] Ha HT, Medved’ M, Nevěřilová Z, et al. Recognition of ocr invoice metadata block types. Proceedings of the 21st International Conference on Text, Speech, and Dialogue. Brno: Springer, 2018. 304–312.
    [36] Nguyen MT, Phan VA, Linh LT, et al. Transfer learning for information extraction with limited data. Proceedings of the 16th International Conference of the Pacific Association for Computational Linguistics. Hanoi: Springer, 2019. 469–482.
    [37] Huang Z, Chen K, He JH, et al. ICDAR2019 competition on scanned receipt OCR and information extraction. 2019 International Conference on Document Analysis and Recognition (ICDAR). Sydney: IEEE, 2019. 1516–1520.
    [38] Zhao XH, Niu ED, Wu Z, et al. CUTIE: Learning to understand documents with convolutional universal text information extractor. arXiv: 1903.12363, 2019.
    [39] Santosh KC. g-DICE: Graph mining-based document information content exploitation. International Journal on Document Analysis and Recognition (IJDAR), 2015, 18(4): 337–355. [doi: 10.1007/s10032-015-0253-z
    [40] Yi F, Zhao YF, Sheng GQ, et al. Dual model medical invoices recognition. Sensors, 2019, 19(20): 4370. [doi: 10.3390/s19204370
    [41] Liu XJ, Gao FY, Zhang Q, et al. Graph convolution for multimodal information extraction from visually rich documents. arXiv: 1903.11279, 2019.
    [42] Sunder V, Srinivasan A, Vig L, et al. One-shot information extraction from document images using neuro-deductive program synthesis. arXiv: 1906.02427, 2019.
    [43] Palm RB, Winther O, Laws F. CloudScan-a configuration-free invoice analysis system using recurrent neural networks. 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). Kyoto: IEEE, 2017. 406–413.
    [44] Bart E, Sarkar P. Information extraction by finding repeated structure. Proceedings of the 9th IAPR International Workshop on Document Analysis Systems. Massachusetts: ACM, 2010. 175–182.
    [45] Schulz F, Ebbecke M, Gillmann M, et al. Seizing the treasure: Transferring knowledge in invoice analysis. 2009 10th International Conference on Document Analysis and Recognition. Barcelona: IEEE, 2009. 848–852.
    [46] Palm RB, Laws F, Winther O. Attend, copy, parse end-to-end information extraction from documents. 2019 International Conference on Document Analysis and Recognition (ICDAR). Sydney: IEEE, 2019. 329–336.
    [47] Chien PH, Lee GC. A template-based method for identifying input regions in survey forms. Pattern Recognition and Image Analysis, 2011, 21(3): 469–472. [doi: 10.1134/S1054661811020210
    [48] Peng HC, Long FH, Chi ZR. Document image recognition based on template matching of component block projections. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003, 25(9): 1188–1192. [doi: 10.1109/TPAMI.2003.1227996
    [49] Sun YY, Mao XF, Hong S, et al. Template matching-based method for intelligent invoice information identification. IEEE Access, 2019, 7: 28392–28401. [doi: 10.1109/ACCESS.2019.2901943
    [50] Tseng LY, Chen RC. Recognition and data extraction of form documents based on three types of line segments. Pattern Recognition, 1998, 31(10): 1525–1540. [doi: 10.1016/S0031-3203(98)00007-7
    [51] Tanaka H, Takebe H, Hotta Y. Robust cell extraction method for form documents based on intersection searching and global optimization. 2011 International Conference on Document Analysis and Recognition. Beijing: IEEE, 2011. 354–358.
    [52] Katti AR, Reisswig C, Guder C, et al. Chargrid: Towards understanding 2D documents. arXiv: 1809.08799, 2018.
    [53] Guo H, Qin XM, Liu JM, et al. EATEN: Entity-aware attention for single shot visual text extraction. 2019 International Conference on Document Analysis and Recognition (ICDAR). Sydney: IEEE, 2019. 254–259.
    [54] Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas: IEEE, 2016. 779–788.
    [55] Redmon J, Farhadi A. YOLO9000: Better, faster, stronger. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu: IEEE, 2017. 6517–6525.
    [56] Redmon J, Farhadi A. YOLOv3: An incremental improvement. arXiv: 1804.02767, 2018.
    [57] Bochkovskiy A, Wang CY, Liao HYM. YOLOv4: Optimal speed and accuracy of object detection. arXiv: 2004.10934, 2020.
    [58] Sun K, Xiao B, Liu D, et al. Deep high-resolution representation learning for human pose estimation. Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE, 2019. 5686–5696.
    [59] Newell A, Yang KY, Deng J. Stacked hourglass networks for human pose estimation. Proceedings of the 14th European Conference on Computer Vision. Amsterdam: Springer, 2016. 483–499.
    [60] Chen YL, Wang ZC, Peng YX, et al. Cascaded pyramid network for multi-person pose estimation. Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 7103–7112.
    [61] Li WB, Wang ZC, Yin BY, et al. Rethinking on multi-stage networks for human pose estimation. arXiv: 1901.00148, 2019.
    [62] Fischler MA, Bolles RC. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. In: Fischler MA, Firschein O, eds. Readings in Computer Vision. Amsterdam: Elsevier, 1987. 726–740.
    [63] Girshick R. Fast R-CNN. 2015 IEEE International Conference on Computer Vision (ICCV). Santiago: IEEE, 2015. 1440–1448.
    [64] Ren SQ, He KM, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149. [doi: 10.1109/TPAMI.2016.2577031
    [65] Liu W, Anguelov D, Erhan D, et al. SSD: Single Shot MultiBox detector. Proceedings of the 14th European Conference on Computer Vision. Amsterdam: Springer, 2016. 21–37.
    [66] Fu CY, Liu W, Ranga A, et al. DSSD: Deconvolutional Single Shot Detector. Computer Vision & Pattern Recognition. IEEE, 2017.
    [67] Du YN, Li CX, Guo RY, et al. PP-OCR: A practical ultra lightweight OCR system. Computer Vision and Pattern Recognition (CVPR). IEEE, 2020.
    [68] Jung A, Wada B, et al. Corvette111/imgaug. https://github.com/aleju/imgaug. (2020-06-01).
    [69] Jocher G, Stoken A, Borovec J, et al. Ultralytics/YOLOV5: V3.0. https://github.com/ultralytics/yolov5. (2020-08-13).
    Related
    Cited by
Get Citation

唐军,唐潮.增值税发票信息结构化识别.计算机系统应用,2021,30(12):317-325

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:February 10,2021
  • Revised:March 18,2021
  • Online: December 10,2021
Article QR Code
You are the first990398Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063