###

计算机系统应用英文版:2021,30(3):14-23

View/Add Comment 过刊浏览高级检索 HTML

←前一篇 | 后一篇→

码上扫一扫！

下载全文

语音识别及端到端技术现状及展望

鱼昆, 张绍阳, 侯佳正, 张少博

(长安大学信息工程学院, 西安 710064)

Survey of Speech Recognition and End-to-End Techniques

YU Kun, ZHANG Shao-Yang, HOU Jia-Zheng, ZHANG Shao-Bo

(School of Information Technology Engineering, Chang’an University, Xi’an 710064, China)

摘要

图/表

参考文献

相似文献

本文已被：浏览 1447次下载 4543次
Received:July 27, 2020 Revised:August 25, 2020

中文摘要: 通过对语音识别技术的发展梳理, 简单介绍了语音识别的历史和应用现状, 并将传统语音识别的技术和当前的研究进展进行描述. 传统语音识别采用基于统计的方法, 采用声谱特征, 在GMM-HMM混合结构上进行训练和匹配. 当前的语音识别模型主要基于深度学习的方法, 采用CNN、RNN都可以有效的进行特征提取从而建立声学模型. 进一步的研究采用了端到端的技术, 避免了多个模型间的误差传导. 端到端技术主要有CTC技术和attention技术, 最新的模型和方法着重研究了attention技术, 并在尝试进行与CTC的融合以达到更好的效果. 最后结合作者自身的理解, 概括了语音识别当前所面临问题和未来发展方向.

中文关键词: 语音识别隐马尔可夫模型深度学习端到端注意力机制

Abstract:The paper briefly introduces the history and application of speech recognition, traditional speech recognition techniques, and current research progress. Traditional speech recognition relies on statistics-based methods and sound spectrum features to train Gaussian Mixture Model-Hidden Markov Model (GMM-HMM) hybrid model. Nowadays, speech recognition models are mainly based on deep learning. Generally, Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) can effectively extract features to establish acoustic models. Further research depends on end-to-end techniques to avoid error transmission among models, and these techniques mainly include Connectionist Temporal Classification (CTC) and attention. The latest models and methods highlight attention, which are trying to integrate it with CTC to achieve better results. Finally, combined with the authors’ understanding, the paper summarizes the existing problems and future development in speech recognition.

keywords: speech recognition HMM deep learning end-to-end attention

文章编号： 中图分类号： 文献标志码：

基金项目:陕西省技术创新引导专项(S2018-YD-CGRGG-0030); 中央高校基本科研业务费高新技术研究培育项目(300102240202); 陕西省自然科学基础研究计划面上项目(2014JM2-5074)

引用文本：
鱼昆,张绍阳,侯佳正,张少博.语音识别及端到端技术现状及展望.计算机系统应用,2021,30(3):14-23
YU Kun,ZHANG Shao-Yang,HOU Jia-Zheng,ZHANG Shao-Bo.Survey of Speech Recognition and End-to-End Techniques.COMPUTER SYSTEMS APPLICATIONS,2021,30(3):14-23

Author Name	Affiliation	E-mail
YU Kun	School of Information Technology Engineering, Chang’an University, Xi’an 710064, China	624501922@qq.com
ZHANG Shao-Yang	School of Information Technology Engineering, Chang’an University, Xi’an 710064, China
HOU Jia-Zheng	School of Information Technology Engineering, Chang’an University, Xi’an 710064, China
ZHANG Shao-Bo	School of Information Technology Engineering, Chang’an University, Xi’an 710064, China