本文已被:浏览 1540次 下载 3812次
Received:August 03, 2012 Revised:September 06, 2012
Received:August 03, 2012 Revised:September 06, 2012
中文摘要: 汉语词典查询是中文信息处理系统的重要基础部分, 对系统效率有重要的影响. 国内自80年代中后期就开展了中文分词词典机制的研究, 为了提高现有基于词典的分词机制的查询效率, 对于词长不超过4字的词提出了一种全新的分词词典机制——基于汉字串进制值的拉链式哈希机制即词值哈希机制. 对每个汉字的机内码从新编码, 利用进制原理, 计算出一个词语的词值, 建立一个拉链式词值哈希机制, 从而提高查询匹配速度.
Abstract:Word query in Chinese Dictionary is essential part in Chinese information processing system. It has a great impact on system efficiency. The Chinese word segmentation has been studied since the late 1980s. In order to improve the existing word query efficiency, for short word of no more than 4 Chinese characters, a new hash algorithm is proposed, named Zipper-style hash indexing based on the value of each characters in Chinese word. The hash value is calculated according to machine code of each character, the weight of the left character is big than the right. The weight is equal to the maximum value of all Chinese characters minus the minimum value. The speed of word query is improved with this kind of Zipper-style Chinese word value hash indexing.
keywords: Chinese information processing Chinese word segmentation dictionary mechanism two thousand decimal zipper-style Chinese word value hash indexing
文章编号: 中图分类号: 文献标志码:
基金项目:
引用文本:
韩莹,王茂发,陈新房,潘志安,张艳霞.汉语自动分词词典新机制—词值哈希机制.计算机系统应用,2013,22(2):233-235
HAN Ying,WANG Mao-Fa,CHEN Xin-Fang,PAN Zhi-An,ZHANG Yan-Xia.New Dictionary Mechanism for Chinese Word Segmentation.COMPUTER SYSTEMS APPLICATIONS,2013,22(2):233-235
韩莹,王茂发,陈新房,潘志安,张艳霞.汉语自动分词词典新机制—词值哈希机制.计算机系统应用,2013,22(2):233-235
HAN Ying,WANG Mao-Fa,CHEN Xin-Fang,PAN Zhi-An,ZHANG Yan-Xia.New Dictionary Mechanism for Chinese Word Segmentation.COMPUTER SYSTEMS APPLICATIONS,2013,22(2):233-235