###

DOI:

计算机系统应用英文版:2016,25(4):221-225

View/Add Comment 过刊浏览高级检索 HTML

←前一篇 | 后一篇→

码上扫一扫！

下载全文

基于汉语拼音首字母索引的混合分词算法

杨进才¹, 陈忠忠¹, 谢芳², 胡金柱¹

(1.华中师范大学计算机学院, 武汉 430079;2.湖北工业大学计算机学院, 武汉 430068)

Hybrid Segmentation Algorithm for Chinese Text Using First Pinyin Letter Index

YANG Jin-Cai¹, CHEN Zhong-Zhong¹, XIE Fang², HU Jin-Zhu¹

(1.School of Computer Science of Central China Normal University, Wuhan 430079, China;2.School of Computer Science of Hubei University of Technology, Wuhan 430068, China)

摘要

图/表

参考文献

相似文献

本文已被：浏览 1465次下载 2447次
Received:July 28, 2015 Revised:September 21, 2015

中文摘要: 中文自动分词是web文本挖掘以及其它中文信息处理应用领域的基础.蓬勃发展的中文信息处理应用对分词技术提出了更高的要求.提出了一种新的分词算法FPLS,该算法用拼音首字母作为词语表一级索引,词语的字数为二级索引构造分词词典,采用双向匹配方法,并引入规则解决歧义切分问题.与现有的快速分词算法比较,该算法分词效率高且正确率高.

中文关键词: 中文分词拼音索引双向匹配歧义切分

Abstract:Chinese automatic segmentation is the basis of web text mining and other Chinese information processing applications. Booming Chinese information processing applications put forward a higher requirement for Chinese automatic segmentation. This paper presents a new segmentation algorithm FPLS, which uses a dictionary with a first letter of the Pinyin as a first level index and words count as the secondary index structure. A bidirectional matching method and rules are employed to resolve ambiguity segmentation problem in the algorithm. Comparing with the existing algorithm, algorithm FPLS gets higher accuracy and efficiency.

keywords: Chinese automatic segmentation Pinyin index bidirectional match ambiguity resolve

文章编号： 中图分类号： 文献标志码：

基金项目:教育部社科基金(13YJAZH117);国家社科基金(14BYY093)

Author Name	Affiliation
YANG Jin-Cai	School of Computer Science of Central China Normal University, Wuhan 430079, China
CHEN Zhong-Zhong	School of Computer Science of Central China Normal University, Wuhan 430079, China
XIE Fang	School of Computer Science of Hubei University of Technology, Wuhan 430068, China
HU Jin-Zhu	School of Computer Science of Central China Normal University, Wuhan 430079, China

Author Name	Affiliation
YANG Jin-Cai	School of Computer Science of Central China Normal University, Wuhan 430079, China
CHEN Zhong-Zhong	School of Computer Science of Central China Normal University, Wuhan 430079, China
XIE Fang	School of Computer Science of Hubei University of Technology, Wuhan 430068, China
HU Jin-Zhu	School of Computer Science of Central China Normal University, Wuhan 430079, China

引用文本：
杨进才,陈忠忠,谢芳,胡金柱.基于汉语拼音首字母索引的混合分词算法.计算机系统应用,2016,25(4):221-225
YANG Jin-Cai,CHEN Zhong-Zhong,XIE Fang,HU Jin-Zhu.Hybrid Segmentation Algorithm for Chinese Text Using First Pinyin Letter Index.COMPUTER SYSTEMS APPLICATIONS,2016,25(4):221-225