基于Attention-CGRU网络的中文语音情感识别

doi:10.15888/j.cnki.csa.008769

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年8月11日 5:46 星期一

首页 > 过刊浏览>2023年第32卷第1期 >296-301. DOI:10.15888/j.cnki.csa.008769

PDF HTML阅读 XML下载导出引用引用提醒

基于Attention-CGRU网络的中文语音情感识别
DOI:
                        10.15888/j.cnki.csa.008769
                    
CSTR:
                        
                    
作者:
                        王茂林王茂林
天津理工大学 计算机科学与工程学院, 天津 300010
在期刊界中查找
在百度中查找
在本站中查找
郝刚郝刚
天津理工大学 计算机科学与工程学院, 天津 300010
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:

Chinese Speech Emotion Recognition Based on Attention-CGRU Network

Author:

WANG Mao-Lin
WANG Mao-Lin
School of Computer Science and Engineering, Tianjin University of Technology, Tianjin 300010, China
在期刊界中查找
在百度中查找
在本站中查找
HAO Gang
HAO Gang
School of Computer Science and Engineering, Tianjin University of Technology, Tianjin 300010, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [22]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

正确识别语音中包含的情感信息可以大幅提高人机交互的效率. 目前, 语音情感识别系统主要由语音特征抽取和语音特征分类两步组成. 为了提高语音情感识别准确率, 选用语谱图而非传统声学特征作为模型输入, 采用基于attention机制的CGRU网络提取语谱图中包含的频域信息和时域信息. 实验结果表明: 在模型中引入注意力机制有利于减少冗余信息的干扰, 并且相较于基于LSTM网络的模型, 采用GRU网络的模型预测精确度更高, 且在训练时收敛更快, 与基于LSTM的基线模型相比, 基于GRU网络的模型训练时长只有前者的60%.

关键词:语音情感识别;注意力机制;门控循环单元;语谱图;深度学习

Abstract:

Accurate recognition of speech emotion information can help to greatly improve the efficiency of human-computer interaction. At present, the speech emotion recognition system mainly consists of two steps: speech feature extraction and speech feature classification. In order to improve the accuracy of speech emotion recognition, the spectrogram is used as the model input instead of traditional acoustic features, and the CGRU network based on the attention mechanism is adopted to extract the frequency domain and time domain information in the spectrogram. The experimental results show that the introduction of the attention mechanism in the model is beneficial to reduce the interference of redundant information, and compared with the model based on the LSTM network, the model using the GRU network can fast converge during training and has higher prediction accuracy. In addition, the training time of the GRU-based model is only 60% of that of the LSTM-based baseline model.

Key words:speech emotion recognition;attention mechanism;gate recurrent unit (GRU);spectrogram;deep learning

参考文献

[1] 张雪英, 孙颖, 张卫, 等. 语音情感识别的关键技术. 太原理工大学学报, 2015, 46(6):629-636, 643

[2] 宋春晓. 情感语音的非线性特征提取及特征优化的研究[硕士学位论文]. 太原:太原理工大学, 2018.

[3] Hu H, Xu MX, Wu W. GMM supervector based SVM with spectral features for speech emotion recognition. Proceedings of 2007 IEEE International Conference on Acoustics, Speech and Signal Processing. Honolulu:IEEE, 2007. IV-413-IV-416.

[4] Lin ZH, Feng MW, Santos CND, et al. A structured self-attentive sentence embedding. Proceedings of the 5th International Conference on Learning Representations. Toulon:ICLR, 2017. 1-15.

[5] Kim Y, Lee H, Provost EM. Deep learning for robust feature generation in audiovisual emotion recognition. Proceedings of 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver:IEEE, 2013. 3687-3691.

[6] Lee J, Tashev I. High-level feature representation using recurrent neural network for speech emotion recognition. Proceedings of the 16th Annual Conference of the International Speech Communication Association. Dresden:ISCA, 2015. 1537-1540.

[7] Satt A, Rozenberg S, Hoory R. Efficient emotion recognition from speech using deep learning on spectrograms. Proceedings of the 18th Annual Conference of the International Speech Communication Association. Stockholm:ISCA, 2017. 1089-1093.

[8] 薛艳飞, 毛启容, 张建明. 基于多任务学习的多语言语音情感识别方法. 计算机应用研究, 2021, 38(4):1069-1073

[9] 张小川, 刘连喜, 戴旭尧, 等. 基于词性特征的CNN_BiGRU文本分类模型. 计算机应用与软件, 2021, 38(11):155-161.[doi:10.3969/j.issn.1000-386x.2021.11.024

[10] 朱星浩, 胥备. 基于GRU算法的音乐和词语的情感语义匹配算法. 计算机技术与发展, 2021, 31(11):46-51.[doi:10.3969/j.issn.1673-629X.2021.11.008

[11] 翟社平, 杨媛媛, 邱程, 等. 基于注意力机制Bi-LSTM算法的双语文本情感分析. 计算机应用与软件, 2019, 36(12):251-255.[doi:10.3969/j.issn.1000-386x.2019.12.040

[12] 李金宇, 王晓晔, 彭宪, 等. 基于双向LSTM的文本情感倾向分类. 计算机科学与应用, 2021, 11(5):1401-1410

[13] Sherstinsky A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D:Nonlinear Phenomena, 2020, 404:132306.[doi:10.1016/j.physd.2019.132306

[14] 陈海涵, 吴国栋, 李景霞, 等. 基于注意力机制的深度学习推荐研究进展. 计算机工程与科学, 2021, 43(2):370-380

[15] 任欢, 王旭光. 注意力机制综述. 计算机应用, 2021, 41(S1):1-6.[doi:10.11772/j.issn.1001-9081.2020101634

[16] 陶华伟, 査诚, 梁瑞宇, 等. 面向语音情感识别的语谱图特征提取算法. 东南大学学报(自然科学版), 2015, 45(5):817-821.[doi:10.3969/j.issn.1001-0505.2015.05.001

[17] 贺昱曜, 李宝奇. 一种组合型的深度学习模型学习率策略. 自动化学报, 2016, 42(6):953-958.[doi:10.16383/j.aas.2016.c150681

[18] Kingma DP, Ba J. Adam:A method for stochastic optimization. Proceedings of the 3rd International Conference on Learning Representations. San Diego:ICLR, 2014.

[19] Srinivasan K, Cherukuri AK, Vincent DR, et al. An efficient implementation of artificial neural networks with K-fold cross-validation for process optimization. Journal of Internet Technology, 2019, 20(4):1213-1225

[20] 陈港, 张石清, 赵小明. 结合数据平衡和注意力机制的CNN+LSTM的自然语音情感识别. 计算机系统应用, 2021, 30(5):269-275.[doi:10.15888/j.cnki.csa.007917

[21] Ruder S. An overview of gradient descent optimization algorithms. arXiv:1609.04747, 2016.

[22] 仝卫国, 李敏霞, 张一可. 深度学习优化算法研究. 计算机科学, 2018, 45(S2):155-159.[doi:10.11896/j.issn.1002-137X.2018.11A.029

引用本文

王茂林,郝刚.基于Attention-CGRU网络的中文语音情感识别.计算机系统应用,2023,32(1):296-301

复制

文章指标

点击次数:1024
下载次数: 2222
HTML阅读次数: 1994
引用次数: 0

历史

收稿日期:2022-02-10
最后修改日期:2022-03-03
录用日期:
在线发布日期: 2022-10-28
出版日期:

微信公众号

网站二维码

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码