###
计算机系统应用英文版:2022,31(7):194-202
本文二维码信息
码上扫一扫!
结合Conformer与N-gram的中文语音识别
(1.山东建筑大学 信息与电气工程学院, 济南 250101;2.山东省智能建筑技术重点实验室, 济南 250101)
Chinese Speech Recognition Based on Conformer and N-gram
(1.School of Information and Electrical Engineering, Shandong Jianzhu University, Jinan 250101, China;2.Shandong Provincial Key Laboratory of Intelligent Building Technology, Jinan 250101, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 1063次   下载 1638
Received:October 28, 2021    Revised:November 29, 2021
中文摘要: Transformer模型对输入序列中重要的信息进行学习, 相比传统的ASR (automatic speech recognition)模型提升了准确性. Conformer模型在Transformer的编码器中加入卷积模块, 增加了获取细微局部信息的能力, 进一步提高了模型性能. 本文结合使用Conformer模型和N-gram语言模型(language model , LM)用于中文语音识别, 获得了良好的识别效果. 在数据集AISHELL-1和aidatatang_200zh上的实验表明, 使用Conformer模型字错率分别可降低到5.79%和5.60%, 较Transformer模型降低了5.82%和2.71%. 结合N-gram语言模型后字错率分别可降低到4.86%和5.10%达到最佳性能, 实时率(real time factor , RTF)达到0.14566. 测试信噪比降低为20 dB时模型字错率才明显下降到8.58%, 表明该模型具有一定的抗噪能力.
Abstract:The Transformer model can learn important information in the input sequence, which shows higher accuracy compared to the traditional automatic speech recognition (ASR) model. The Conformer model adds a convolution module to the Transformer’s encoder, which increases the ability to obtain subtle local information and further improves the performance of the model. In this study, the Conformer model and the N-gram language model (LM) are used in combination for Chinese speech recognition, and a good recognition effect is obtained. Experiments on the data sets of AISHELL-1 and aidatatang_200zh show that the character error rate of the Conformer model can be reduced to 5.79% and 5.60%, respectively, which is 5.82% and 2.71% lower than that of the Transformer model. Upon the combination with the N-gram LM, the character error rate can be reduced to the optimal performance of 4.86% and 5.10%, respectively, and the real-time factor (RTF) can reach 0.14566. When the test signal-to-noise ratio is reduced to 20 dB, the character error rate of the model drops to 8.58%, which indicates the anti-noise ability of the model.
文章编号:     中图分类号:    文献标志码:
基金项目:山东省重大科技创新工程(2019JZZZY010120); 山东省重点研发计划(2019GSF111054)
引用文本:
许鸿奎,卢江坤,张子枫,周俊杰,胡文烨,姜彤彤,郭文涛,李振业.结合Conformer与N-gram的中文语音识别.计算机系统应用,2022,31(7):194-202
XU Hong-Kui,LU Jiang-Kun,ZHANG Zi-Feng,ZHOU Jun-Jie,HU Wen-Ye,JIANG Tong-Tong,GUO Wen-Tao,LI Zhen-Ye.Chinese Speech Recognition Based on Conformer and N-gram.COMPUTER SYSTEMS APPLICATIONS,2022,31(7):194-202