基于Attention-CGRU网络的中文语音情感识别
作者:

Chinese Speech Emotion Recognition Based on Attention-CGRU Network
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [22]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    正确识别语音中包含的情感信息可以大幅提高人机交互的效率. 目前, 语音情感识别系统主要由语音特征抽取和语音特征分类两步组成. 为了提高语音情感识别准确率, 选用语谱图而非传统声学特征作为模型输入, 采用基于attention机制的CGRU网络提取语谱图中包含的频域信息和时域信息. 实验结果表明: 在模型中引入注意力机制有利于减少冗余信息的干扰, 并且相较于基于LSTM网络的模型, 采用GRU网络的模型预测精确度更高, 且在训练时收敛更快, 与基于LSTM的基线模型相比, 基于GRU网络的模型训练时长只有前者的60%.

    Abstract:

    Accurate recognition of speech emotion information can help to greatly improve the efficiency of human-computer interaction. At present, the speech emotion recognition system mainly consists of two steps: speech feature extraction and speech feature classification. In order to improve the accuracy of speech emotion recognition, the spectrogram is used as the model input instead of traditional acoustic features, and the CGRU network based on the attention mechanism is adopted to extract the frequency domain and time domain information in the spectrogram. The experimental results show that the introduction of the attention mechanism in the model is beneficial to reduce the interference of redundant information, and compared with the model based on the LSTM network, the model using the GRU network can fast converge during training and has higher prediction accuracy. In addition, the training time of the GRU-based model is only 60% of that of the LSTM-based baseline model.

    参考文献
    [1] 张雪英, 孙颖, 张卫, 等. 语音情感识别的关键技术. 太原理工大学学报, 2015, 46(6):629-636, 643
    [2] 宋春晓. 情感语音的非线性特征提取及特征优化的研究[硕士学位论文]. 太原:太原理工大学, 2018.
    [3] Hu H, Xu MX, Wu W. GMM supervector based SVM with spectral features for speech emotion recognition. Proceedings of 2007 IEEE International Conference on Acoustics, Speech and Signal Processing. Honolulu:IEEE, 2007. IV-413-IV-416.
    [4] Lin ZH, Feng MW, Santos CND, et al. A structured self-attentive sentence embedding. Proceedings of the 5th International Conference on Learning Representations. Toulon:ICLR, 2017. 1-15.
    [5] Kim Y, Lee H, Provost EM. Deep learning for robust feature generation in audiovisual emotion recognition. Proceedings of 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver:IEEE, 2013. 3687-3691.
    [6] Lee J, Tashev I. High-level feature representation using recurrent neural network for speech emotion recognition. Proceedings of the 16th Annual Conference of the International Speech Communication Association. Dresden:ISCA, 2015. 1537-1540.
    [7] Satt A, Rozenberg S, Hoory R. Efficient emotion recognition from speech using deep learning on spectrograms. Proceedings of the 18th Annual Conference of the International Speech Communication Association. Stockholm:ISCA, 2017. 1089-1093.
    [8] 薛艳飞, 毛启容, 张建明. 基于多任务学习的多语言语音情感识别方法. 计算机应用研究, 2021, 38(4):1069-1073
    [9] 张小川, 刘连喜, 戴旭尧, 等. 基于词性特征的CNN_BiGRU文本分类模型. 计算机应用与软件, 2021, 38(11):155-161.[doi:10.3969/j.issn.1000-386x.2021.11.024
    [10] 朱星浩, 胥备. 基于GRU算法的音乐和词语的情感语义匹配算法. 计算机技术与发展, 2021, 31(11):46-51.[doi:10.3969/j.issn.1673-629X.2021.11.008
    [11] 翟社平, 杨媛媛, 邱程, 等. 基于注意力机制Bi-LSTM算法的双语文本情感分析. 计算机应用与软件, 2019, 36(12):251-255.[doi:10.3969/j.issn.1000-386x.2019.12.040
    [12] 李金宇, 王晓晔, 彭宪, 等. 基于双向LSTM的文本情感倾向分类. 计算机科学与应用, 2021, 11(5):1401-1410
    [13] Sherstinsky A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D:Nonlinear Phenomena, 2020, 404:132306.[doi:10.1016/j.physd.2019.132306
    [14] 陈海涵, 吴国栋, 李景霞, 等. 基于注意力机制的深度学习推荐研究进展. 计算机工程与科学, 2021, 43(2):370-380
    [15] 任欢, 王旭光. 注意力机制综述. 计算机应用, 2021, 41(S1):1-6.[doi:10.11772/j.issn.1001-9081.2020101634
    [16] 陶华伟, 査诚, 梁瑞宇, 等. 面向语音情感识别的语谱图特征提取算法. 东南大学学报(自然科学版), 2015, 45(5):817-821.[doi:10.3969/j.issn.1001-0505.2015.05.001
    [17] 贺昱曜, 李宝奇. 一种组合型的深度学习模型学习率策略. 自动化学报, 2016, 42(6):953-958.[doi:10.16383/j.aas.2016.c150681
    [18] Kingma DP, Ba J. Adam:A method for stochastic optimization. Proceedings of the 3rd International Conference on Learning Representations. San Diego:ICLR, 2014.
    [19] Srinivasan K, Cherukuri AK, Vincent DR, et al. An efficient implementation of artificial neural networks with K-fold cross-validation for process optimization. Journal of Internet Technology, 2019, 20(4):1213-1225
    [20] 陈港, 张石清, 赵小明. 结合数据平衡和注意力机制的CNN+LSTM的自然语音情感识别. 计算机系统应用, 2021, 30(5):269-275.[doi:10.15888/j.cnki.csa.007917
    [21] Ruder S. An overview of gradient descent optimization algorithms. arXiv:1609.04747, 2016.
    [22] 仝卫国, 李敏霞, 张一可. 深度学习优化算法研究. 计算机科学, 2018, 45(S2):155-159.[doi:10.11896/j.issn.1002-137X.2018.11A.029
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

王茂林,郝刚.基于Attention-CGRU网络的中文语音情感识别.计算机系统应用,2023,32(1):296-301

复制
分享
文章指标
  • 点击次数:1024
  • 下载次数: 2222
  • HTML阅读次数: 1994
  • 引用次数: 0
历史
  • 收稿日期:2022-02-10
  • 最后修改日期:2022-03-03
  • 在线发布日期: 2022-10-28
文章二维码
您是第12826496位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号