结合Conformer与N-gram的中文语音识别

doi:10.15888/j.cnki.csa.008638

微信公众号

网站二维码

首页 > 过刊浏览>2022年第31卷第7期 >194-202. DOI:10.15888/j.cnki.csa.008638

PDF HTML阅读 XML下载导出引用引用提醒

结合Conformer与N-gram的中文语音识别
DOI:
                        10.15888/j.cnki.csa.008638
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:山东省重大科技创新工程(2019JZZZY010120); 山东省重点研发计划(2019GSF111054)

Chinese Speech Recognition Based on Conformer and N-gram

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

Transformer模型对输入序列中重要的信息进行学习, 相比传统的ASR (automatic speech recognition)模型提升了准确性. Conformer模型在Transformer的编码器中加入卷积模块, 增加了获取细微局部信息的能力, 进一步提高了模型性能. 本文结合使用Conformer模型和N-gram语言模型(language model , LM)用于中文语音识别, 获得了良好的识别效果. 在数据集AISHELL-1和aidatatang_200zh上的实验表明, 使用Conformer模型字错率分别可降低到5.79%和5.60%, 较Transformer模型降低了5.82%和2.71%. 结合N-gram语言模型后字错率分别可降低到4.86%和5.10%达到最佳性能, 实时率(real time factor , RTF)达到0.14566. 测试信噪比降低为20 dB时模型字错率才明显下降到8.58%, 表明该模型具有一定的抗噪能力.

Abstract:

The Transformer model can learn important information in the input sequence, which shows higher accuracy compared to the traditional automatic speech recognition (ASR) model. The Conformer model adds a convolution module to the Transformer’s encoder, which increases the ability to obtain subtle local information and further improves the performance of the model. In this study, the Conformer model and the N-gram language model (LM) are used in combination for Chinese speech recognition, and a good recognition effect is obtained. Experiments on the data sets of AISHELL-1 and aidatatang_200zh show that the character error rate of the Conformer model can be reduced to 5.79% and 5.60%, respectively, which is 5.82% and 2.71% lower than that of the Transformer model. Upon the combination with the N-gram LM, the character error rate can be reduced to the optimal performance of 4.86% and 5.10%, respectively, and the real-time factor (RTF) can reach 0.14566. When the test signal-to-noise ratio is reduced to 20 dB, the character error rate of the model drops to 8.58%, which indicates the anti-noise ability of the model.

参考文献

相似文献

引证文献

引用本文

许鸿奎,卢江坤,张子枫,周俊杰,胡文烨,姜彤彤,郭文涛,李振业.结合Conformer与N-gram的中文语音识别.计算机系统应用,2022,31(7):194-202

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2021-10-28
最后修改日期:2021-11-29
录用日期:
在线发布日期: 2022-05-31
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码