Word-Class Expansion Method About Training Corpus of Language Modal in Restrcited Domain

AIPUB归智期刊联盟

WeChat

Mobile website

2025-4-26- 14

Home > Archive>Volume 20, Issue 11, 2011 >55-58

PDF HTML XML Export Cite reminder

Word-Class Expansion Method About Training Corpus of Language Modal in Restrcited Domain
DOI:
                        
                    
CSTR:
                        [cstr]
                    
Author:
                        HUANG Yun-ZhuHUANG Yun-Zhu
Institute of Autumation, Chinese Academy of Sciences, Beijing 100190, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
WEI WeiWEI Wei
Institute of Autumation, Chinese Academy of Sciences, Beijing 100190, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
LUO Yang-YuLUO Yang-Yu
Institute of Autumation, Chinese Academy of Sciences, Beijing 100190, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
LI Cheng-RongLI Cheng-Rong
Institute of Autumation, Chinese Academy of Sciences, Beijing 100190, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

It is time-consuming to collect training corpus of language modal in restricted domain. The insufficiency of corpus will lead to the problem of training data sparsity. There are two common methods to solve this problem. One is reducing the complexion of modal through data smoothing. The other is expanding the corpus. In this paper, a semiautomatic method to expand training corpus of language modal is proposed. A large list of word classes is generated by calculating the mutual information of non-restricted areas corpus in large scale. Then, those word classes related to the restricted domain is extracted and manually cut out to estimate parameters of language modal. Experimental results show that the method could effectively solve the problem of training data sparsity, and improve the recognition rate of speech recognition system.

Key words:corpus expansion;mutual information;language modal;speech recognition;word classes

Get Citation

黄韵竹,韦玮,罗杨宇,李成荣.限定领域语言模型训练语料的词类扩展方法.计算机系统应用,2011,20(11):55-58

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:March 09,2011
Revised:March 30,2011
Adopted:
Online:
Published:

Article QR Code

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address：4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code：100190
Phone：010-62661041 Fax： Email：csa (a) iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063