基于贪婪算法的文档图像中干扰线的去除
作者:
作者单位:

作者简介:

通讯作者:

基金项目:

国家自然科学基金(61672369);中央引导地方科技发展专项(2018L3013);福建省自然科学基金面上项目(2015J01669,2017J01651);福建省教育厅中青年教师项目(JA15522)


Interferential Line Elimination in Document Image Based on Greedy Algorithm
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    各种文档中经常包含有各种特殊作用的横线、手划线等,当这些文档通过扫描等数字化方式存入计算机并需要进一步识别处理成文字编码时,这些线条却成为OCR的干扰因素,降低了文档内容的识别率.为此,本文提出一种新的文档干扰线去除算法,先将文档图像二值化,二值化过程考虑了不均匀光照带来的影响;然后将前景细化为单像素,减少线条粗细造成的影响;接着通过一种改进的贪婪算法计算横、竖两个方向线段的权重,判断权重较高的线段为干扰线;最后通过与干扰线距离的大小判断图像中每个前景像素的归属,从而获得一个完整的文档恢复图.仿真实验表明,本文提出的算法能够有效去除干扰线,特别在干扰线与文字粘连的情况下,去除干扰线的同时较少地影响文档图像的质量,且具有较高的计算速度和较好的去除效果,为图像进一步OCR识别提供了良好的基础.

    Abstract:

    Documents often contain horizontal lines, hand lines, etc., which are used for various special functions. When these documents are stored in computers by scanning or the like and need to be further recognized and processed into text codes, these lines become interference factors of OCR, thus the recognition rate of document content is decreased. This study proposes a new document interference line removal algorithm, which first binarizes the document image, and the binarization process takes into account the effects of uneven illumination; then the foreground is refined into single pixels, reducing the thickness of the lines. The effect is then calculated by an improved greedy algorithm to calculate the weights of the horizontal and vertical line segments, and the line segment with higher weight is determined as the interference line; finally, the distance of each foreground pixel in the image is determined by the distance from the interference line. Thereby obtaining a complete document recovery map. The simulation results show that the proposed algorithm can effectively remove the interference lines, especially in the case of interference lines and text adhesion, and remove the interference lines while affecting the quality of document images less, and has a higher computing speed and better removal effect. The removal effect provides a good basis for further OCR recognition of images.

    参考文献
    相似文献
    引证文献
引用本文

王平,张晓峰,王宜怀,程仁贵.基于贪婪算法的文档图像中干扰线的去除.计算机系统应用,2019,28(11):238-244

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2019-03-29
  • 最后修改日期:2019-04-26
  • 录用日期:
  • 在线发布日期: 2019-11-08
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号