本文已被:浏览 2483次 下载 7164次
Received:January 29, 2018 Revised:February 27, 2018
Received:January 29, 2018 Revised:February 27, 2018
中文摘要: N-gram模型是自然语言处理中最常用的语言模型之一,广泛应用于语音识别、手写识别、拼写纠错、机器翻译和搜索引擎等众多任务.但是N-gram模型在训练和应用时经常会出现零概率问题,导致无法获得良好的语言模型,因此出现了拉普拉斯平滑、卡茨回退和Kneser-Ney平滑等平滑方法.在介绍了这些平滑方法的基本原理后,使用困惑度作为度量标准去比较了基于这几种平滑方法所训练出的语言模型.
中文关键词: N-gram模型 拉普拉斯平滑 卡茨回退 Kneser-Ney平滑 困惑度
Abstract:The N-gram model is one of the most commonly used language models in natural language processing and is widely used in many tasks such as speech recognition, handwriting recognition, spelling correction, machine translation and search engines. However, the N-gram model often presents zero-probability problems in training and application, resulting in failure to obtain a good language model. As a result, smoothing methods such as Laplace smoothing, Katz back-off, and Kneser-Ney smoothing appeared. After introducing the basic principles of these smoothing methods, we use the perplexity as a metric to compare the language models trained based on these types of smoothing methods.
文章编号: 中图分类号: 文献标志码:
基金项目:
引用文本:
尹陈,吴敏.N-gram模型综述.计算机系统应用,2018,27(10):33-38
YIN Chen,WU Min.Survey on N-gram Model.COMPUTER SYSTEMS APPLICATIONS,2018,27(10):33-38
尹陈,吴敏.N-gram模型综述.计算机系统应用,2018,27(10):33-38
YIN Chen,WU Min.Survey on N-gram Model.COMPUTER SYSTEMS APPLICATIONS,2018,27(10):33-38