###
DOI:
计算机系统应用英文版:2016,25(4):221-225
本文二维码信息
码上扫一扫!
基于汉语拼音首字母索引的混合分词算法
(1.华中师范大学 计算机学院, 武汉 430079;2.湖北工业大学 计算机学院, 武汉 430068)
Hybrid Segmentation Algorithm for Chinese Text Using First Pinyin Letter Index
(1.School of Computer Science of Central China Normal University, Wuhan 430079, China;2.School of Computer Science of Hubei University of Technology, Wuhan 430068, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 1465次   下载 2447
Received:July 28, 2015    Revised:September 21, 2015
中文摘要: 中文自动分词是web文本挖掘以及其它中文信息处理应用领域的基础.蓬勃发展的中文信息处理应用对分词技术提出了更高的要求.提出了一种新的分词算法FPLS,该算法用拼音首字母作为词语表一级索引,词语的字数为二级索引构造分词词典,采用双向匹配方法,并引入规则解决歧义切分问题.与现有的快速分词算法比较,该算法分词效率高且正确率高.
Abstract:Chinese automatic segmentation is the basis of web text mining and other Chinese information processing applications. Booming Chinese information processing applications put forward a higher requirement for Chinese automatic segmentation. This paper presents a new segmentation algorithm FPLS, which uses a dictionary with a first letter of the Pinyin as a first level index and words count as the secondary index structure. A bidirectional matching method and rules are employed to resolve ambiguity segmentation problem in the algorithm. Comparing with the existing algorithm, algorithm FPLS gets higher accuracy and efficiency.
文章编号:     中图分类号:    文献标志码:
基金项目:教育部社科基金(13YJAZH117);国家社科基金(14BYY093)
引用文本:
杨进才,陈忠忠,谢芳,胡金柱.基于汉语拼音首字母索引的混合分词算法.计算机系统应用,2016,25(4):221-225
YANG Jin-Cai,CHEN Zhong-Zhong,XIE Fang,HU Jin-Zhu.Hybrid Segmentation Algorithm for Chinese Text Using First Pinyin Letter Index.COMPUTER SYSTEMS APPLICATIONS,2016,25(4):221-225