结合Conformer与N-gram的中文语音识别
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

山东省重大科技创新工程(2019JZZZY010120); 山东省重点研发计划(2019GSF111054)


Chinese Speech Recognition Based on Conformer and N-gram
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    Transformer模型对输入序列中重要的信息进行学习, 相比传统的ASR (automatic speech recognition)模型提升了准确性. Conformer模型在Transformer的编码器中加入卷积模块, 增加了获取细微局部信息的能力, 进一步提高了模型性能. 本文结合使用Conformer模型和N-gram语言模型(language model , LM)用于中文语音识别, 获得了良好的识别效果. 在数据集AISHELL-1和aidatatang_200zh上的实验表明, 使用Conformer模型字错率分别可降低到5.79%和5.60%, 较Transformer模型降低了5.82%和2.71%. 结合N-gram语言模型后字错率分别可降低到4.86%和5.10%达到最佳性能, 实时率(real time factor , RTF)达到0.14566. 测试信噪比降低为20 dB时模型字错率才明显下降到8.58%, 表明该模型具有一定的抗噪能力.

    Abstract:

    The Transformer model can learn important information in the input sequence, which shows higher accuracy compared to the traditional automatic speech recognition (ASR) model. The Conformer model adds a convolution module to the Transformer’s encoder, which increases the ability to obtain subtle local information and further improves the performance of the model. In this study, the Conformer model and the N-gram language model (LM) are used in combination for Chinese speech recognition, and a good recognition effect is obtained. Experiments on the data sets of AISHELL-1 and aidatatang_200zh show that the character error rate of the Conformer model can be reduced to 5.79% and 5.60%, respectively, which is 5.82% and 2.71% lower than that of the Transformer model. Upon the combination with the N-gram LM, the character error rate can be reduced to the optimal performance of 4.86% and 5.10%, respectively, and the real-time factor (RTF) can reach 0.14566. When the test signal-to-noise ratio is reduced to 20 dB, the character error rate of the model drops to 8.58%, which indicates the anti-noise ability of the model.

    参考文献
    相似文献
    引证文献
引用本文

许鸿奎,卢江坤,张子枫,周俊杰,胡文烨,姜彤彤,郭文涛,李振业.结合Conformer与N-gram的中文语音识别.计算机系统应用,2022,31(7):194-202

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2021-10-28
  • 最后修改日期:2021-11-29
  • 录用日期:
  • 在线发布日期: 2022-05-31
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号