汉语词典查询是中文信息处理系统的重要基础部分, 对系统效率有重要的影响. 国内自80年代中后期就开展了中文分词词典机制的研究, 为了提高现有基于词典的分词机制的查询效率, 对于词长不超过4字的词提出了一种全新的分词词典机制——基于汉字串进制值的拉链式哈希机制即词值哈希机制. 对每个汉字的机内码从新编码, 利用进制原理, 计算出一个词语的词值, 建立一个拉链式词值哈希机制, 从而提高查询匹配速度.
Word query in Chinese Dictionary is essential part in Chinese information processing system. It has a great impact on system efficiency. The Chinese word segmentation has been studied since the late 1980s. In order to improve the existing word query efficiency, for short word of no more than 4 Chinese characters, a new hash algorithm is proposed, named Zipper-style hash indexing based on the value of each characters in Chinese word. The hash value is calculated according to machine code of each character, the weight of the left character is big than the right. The weight is equal to the maximum value of all Chinese characters minus the minimum value. The speed of word query is improved with this kind of Zipper-style Chinese word value hash indexing.