###

DOI:

计算机系统应用英文版:2011,20(11):55-58

View/Add Comment 过刊浏览高级检索 HTML

←前一篇 | 后一篇→

码上扫一扫！

下载全文

限定领域语言模型训练语料的词类扩展方法

黄韵竹, 韦玮, 罗杨宇, 李成荣

(中国科学院自动化研究所,北京 100190)

Word-Class Expansion Method About Training Corpus of Language Modal in Restrcited Domain

HUANG Yun-Zhu, WEI Wei, LUO Yang-Yu, LI Cheng-Rong

(Institute of Autumation, Chinese Academy of Sciences, Beijing 100190, China)

摘要

图/表

参考文献

相似文献

本文已被：浏览 4124次下载 5321次
Received:March 09, 2011 Revised:March 30, 2011

中文摘要: 限定领域的语言模型训练语料的搜集需要耗费大量的人力物力,如果语料搜集不充分,往往会造成数据稀疏的问题。解决该问题的方法有两种：1、采用数据平滑算法,降低模型的困惑度；2、对训练语料进行扩展。探索了对语言模型的训练语料进行半自动扩展的方法。该方法通过计算互信息将非限定领域的大规模语料分成若干词类,生成大词类表；再将该表中领域相关的词类提取出来,进行手动删减之后用于对限定领域的语言模型进行参数估计。实验表明,将该方法用于语音识别系统,能有效缩短语言模型训练语料的搜集时间,提高系统的识别率。

中文关键词: 语料扩展互信息语言模型语音识别词类

Abstract:It is time-consuming to collect training corpus of language modal in restricted domain. The insufficiency of corpus will lead to the problem of training data sparsity. There are two common methods to solve this problem. One is reducing the complexion of modal through data smoothing. The other is expanding the corpus. In this paper, a semiautomatic method to expand training corpus of language modal is proposed. A large list of word classes is generated by calculating the mutual information of non-restricted areas corpus in large scale. Then, those word classes related to the restricted domain is extracted and manually cut out to estimate parameters of language modal. Experimental results show that the method could effectively solve the problem of training data sparsity, and improve the recognition rate of speech recognition system.

keywords: corpus expansion mutual information language modal speech recognition word classes

文章编号： 中图分类号： 文献标志码：

基金项目:

引用文本：
黄韵竹,韦玮,罗杨宇,李成荣.限定领域语言模型训练语料的词类扩展方法.计算机系统应用,2011,20(11):55-58
HUANG Yun-Zhu,WEI Wei,LUO Yang-Yu,LI Cheng-Rong.Word-Class Expansion Method About Training Corpus of Language Modal in Restrcited Domain.COMPUTER SYSTEMS APPLICATIONS,2011,20(11):55-58

Author Name	Affiliation
HUANG Yun-Zhu	Institute of Autumation, Chinese Academy of Sciences, Beijing 100190, China
WEI Wei	Institute of Autumation, Chinese Academy of Sciences, Beijing 100190, China
LUO Yang-Yu	Institute of Autumation, Chinese Academy of Sciences, Beijing 100190, China
LI Cheng-Rong	Institute of Autumation, Chinese Academy of Sciences, Beijing 100190, China