基于Conformer-SE的端到端语音识别

doi:10.15888/j.cnki.csa.009718

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月16日 9:59 星期三

首页 > 过刊浏览>2024年第33卷第12期 >106-114. DOI:10.15888/j.cnki.csa.009718

PDF HTML阅读 XML下载导出引用引用提醒

基于Conformer-SE的端到端语音识别
DOI:
                        10.15888/j.cnki.csa.009718
                    
CSTR:
                        32024.14.csa.009718
                    
作者:
                        马永杰马永杰
吉林化工学院 信息与控制工程学院, 吉林 132022
在期刊界中查找
在百度中查找
在本站中查找
李罡李罡
白城师范学院 机械与控制工程学院, 白城 137000
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:2022年度吉林省教育厅科学技术研究项目(JJKH20220013KJ); 2023年大学生创新创业训练计划(202310206035)

End-to-end Speech Recognition Based on Conformer-SE

Author:

MA Yong-Jie
MA Yong-Jie
School of Information and Control Engineering, Jilin Institute of Chemical Technology, Jilin 132022, China
在期刊界中查找
在百度中查找
在本站中查找
LI Gang
LI Gang
School of Mechanical and Control Engineering, Baicheng Normal University, Baicheng 137000, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

基于自注意力机制的Transformer端到端模型在语音识别任务中表现出了卓越的性能. 然而, 该模型在浅层处理时对局部特征信息的捕捉能力存在一定的局限, 同时也没有充分考虑不同块之间的相互依赖性. 为了解决这些问题, 提出了一种改进的Conformer-SE端到端语音识别系统模型. 该模型首先采用了Conformer结构来替代Transformer中的编码器部分, 从而增强了模型对局部特征的提取能力. 接着, 通过引入SE注意力通道机制, 将每个块的输出以加权求和的形式整合到最终的输出中. 在Aishell-1这一公开数据集上的实验结果显示, 相较于原始的Transformer模型, Conformer-SE模型在字符错误率上相对降低了18.18%.

关键词:语音识别;端到端;Transformer;Conformer;SE注意力通道

Abstract:

The end-to-end Transformer model based on the self-attention mechanism shows superior performance in speech recognition. However, this model has limitations in capturing local feature information during shallow processing and does not fully consider the interdependence between different blocks. To address these issues, this study proposes Conformer-SE, an improved end-to-end model for speech recognition. The model first adopts the Conformer structure to replace the encoder in the Transformer model, thus enhancing its ability to extract local features. Next, by introducing the SE channel attention mechanism, it integrates the output of each block into the final output through a weighted sum. The experimental results on the Aishell-1 dataset show that the Conformer-SE model reduces the character error rate by 18.18% compared to the original Transformer model.

Key words:speech recognition;end-to-end;Transformer;Conformer;SE attention channel

引用本文

马永杰,李罡.基于Conformer-SE的端到端语音识别.计算机系统应用,2024,33(12):106-114

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2024-05-28
最后修改日期:2024-06-26
录用日期:
在线发布日期: 2024-10-31
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码