###

计算机系统应用英文版:2022,31(7):194-202

View/Add Comment 过刊浏览高级检索 HTML

←前一篇 | 后一篇→

码上扫一扫！

下载全文

结合Conformer与N-gram的中文语音识别

许鸿奎^1,2, 卢江坤¹, 张子枫¹, 周俊杰¹, 胡文烨¹, 姜彤彤¹, 郭文涛¹, 李振业¹

(1.山东建筑大学信息与电气工程学院, 济南 250101;2.山东省智能建筑技术重点实验室, 济南 250101)

Chinese Speech Recognition Based on Conformer and N-gram

XU Hong-Kui^1,2, LU Jiang-Kun¹, ZHANG Zi-Feng¹, ZHOU Jun-Jie¹, HU Wen-Ye¹, JIANG Tong-Tong¹, GUO Wen-Tao¹, LI Zhen-Ye¹

(1.School of Information and Electrical Engineering, Shandong Jianzhu University, Jinan 250101, China;2.Shandong Provincial Key Laboratory of Intelligent Building Technology, Jinan 250101, China)

摘要

图/表

参考文献

相似文献

本文已被：浏览 1063次下载 1638次
Received:October 28, 2021 Revised:November 29, 2021

中文摘要: Transformer模型对输入序列中重要的信息进行学习, 相比传统的ASR (automatic speech recognition)模型提升了准确性. Conformer模型在Transformer的编码器中加入卷积模块, 增加了获取细微局部信息的能力, 进一步提高了模型性能. 本文结合使用Conformer模型和N-gram语言模型(language model , LM)用于中文语音识别, 获得了良好的识别效果. 在数据集AISHELL-1和aidatatang_200zh上的实验表明, 使用Conformer模型字错率分别可降低到5.79%和5.60%, 较Transformer模型降低了5.82%和2.71%. 结合N-gram语言模型后字错率分别可降低到4.86%和5.10%达到最佳性能, 实时率(real time factor , RTF)达到0.14566. 测试信噪比降低为20 dB时模型字错率才明显下降到8.58%, 表明该模型具有一定的抗噪能力.

中文关键词: 语音识别 Transformer 语言模型 Conformer 深度学习

Abstract:The Transformer model can learn important information in the input sequence, which shows higher accuracy compared to the traditional automatic speech recognition (ASR) model. The Conformer model adds a convolution module to the Transformer’s encoder, which increases the ability to obtain subtle local information and further improves the performance of the model. In this study, the Conformer model and the N-gram language model (LM) are used in combination for Chinese speech recognition, and a good recognition effect is obtained. Experiments on the data sets of AISHELL-1 and aidatatang_200zh show that the character error rate of the Conformer model can be reduced to 5.79% and 5.60%, respectively, which is 5.82% and 2.71% lower than that of the Transformer model. Upon the combination with the N-gram LM, the character error rate can be reduced to the optimal performance of 4.86% and 5.10%, respectively, and the real-time factor (RTF) can reach 0.14566. When the test signal-to-noise ratio is reduced to 20 dB, the character error rate of the model drops to 8.58%, which indicates the anti-noise ability of the model.

keywords: speech recognition Transformer language model (LM) Conformer deep learning

文章编号： 中图分类号： 文献标志码：

基金项目:山东省重大科技创新工程(2019JZZZY010120); 山东省重点研发计划(2019GSF111054)

Author Name	Affiliation	E-mail
XU Hong-Kui	School of Information and Electrical Engineering, Shandong Jianzhu University, Jinan 250101, China Shandong Provincial Key Laboratory of Intelligent Building Technology, Jinan 250101, China	xhkui2009@163.com
LU Jiang-Kun	School of Information and Electrical Engineering, Shandong Jianzhu University, Jinan 250101, China
ZHANG Zi-Feng	School of Information and Electrical Engineering, Shandong Jianzhu University, Jinan 250101, China
ZHOU Jun-Jie	School of Information and Electrical Engineering, Shandong Jianzhu University, Jinan 250101, China
HU Wen-Ye	School of Information and Electrical Engineering, Shandong Jianzhu University, Jinan 250101, China
JIANG Tong-Tong	School of Information and Electrical Engineering, Shandong Jianzhu University, Jinan 250101, China
GUO Wen-Tao	School of Information and Electrical Engineering, Shandong Jianzhu University, Jinan 250101, China
LI Zhen-Ye	School of Information and Electrical Engineering, Shandong Jianzhu University, Jinan 250101, China

Author Name	Affiliation	E-mail
XU Hong-Kui	School of Information and Electrical Engineering, Shandong Jianzhu University, Jinan 250101, China Shandong Provincial Key Laboratory of Intelligent Building Technology, Jinan 250101, China	xhkui2009@163.com
LU Jiang-Kun	School of Information and Electrical Engineering, Shandong Jianzhu University, Jinan 250101, China
ZHANG Zi-Feng	School of Information and Electrical Engineering, Shandong Jianzhu University, Jinan 250101, China
ZHOU Jun-Jie	School of Information and Electrical Engineering, Shandong Jianzhu University, Jinan 250101, China
HU Wen-Ye	School of Information and Electrical Engineering, Shandong Jianzhu University, Jinan 250101, China
JIANG Tong-Tong	School of Information and Electrical Engineering, Shandong Jianzhu University, Jinan 250101, China
GUO Wen-Tao	School of Information and Electrical Engineering, Shandong Jianzhu University, Jinan 250101, China
LI Zhen-Ye	School of Information and Electrical Engineering, Shandong Jianzhu University, Jinan 250101, China

引用文本：
许鸿奎,卢江坤,张子枫,周俊杰,胡文烨,姜彤彤,郭文涛,李振业.结合Conformer与N-gram的中文语音识别.计算机系统应用,2022,31(7):194-202
XU Hong-Kui,LU Jiang-Kun,ZHANG Zi-Feng,ZHOU Jun-Jie,HU Wen-Ye,JIANG Tong-Tong,GUO Wen-Tao,LI Zhen-Ye.Chinese Speech Recognition Based on Conformer and N-gram.COMPUTER SYSTEMS APPLICATIONS,2022,31(7):194-202