融合语音、脑电和人脸表情的多模态情绪识别
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

科技创新2030“脑科学与类脑研究”重点项目(2022ZD0208900); 国家自然科学基金面上项目(62076103)


Multimodal Emotion Recognition Based on Speech, EEG and Facial Expression
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 增强出版
  • |
  • 文章评论
    摘要:

    本文提出了一种多模态情绪识别方法, 该方法融合语音、脑电及人脸的情绪识别结果来从多个角度综合判断人的情绪, 有效地解决了过去研究中准确率低、模型鲁棒性差的问题. 对于语音信号, 本文设计了一个轻量级全卷积神经网络, 该网络能够很好地学习语音情绪特征且在轻量级方面拥有绝对的优势. 对于脑电信号, 本文提出了一个树状LSTM模型, 可以全面学习每个阶段的情绪特征. 对于人脸信号, 本文使用GhostNet进行特征学习, 并改进了GhostNet的结构使其性能大幅提升. 此外, 我们设计了一个最优权重分布算法来搜寻各模态识别结果的可信度来进行决策级融合, 从而得到更全面、更准确的结果. 上述方法在EMO-DB与CK+数据集上分别达到了94.36%与98.27%的准确率, 且提出的融合方法在MAHNOB-HCI数据库的唤醒效价两个维度上分别得到了90.25%与89.33%的准确率. 我们的实验结果表明, 与使用单一模态以及传统的融合方式进行情绪识别相比, 本文提出的多模态情绪识别方法有效地提高了识别准确率.

    Abstract:

    In this study, a multimodal emotion recognition method is proposed, which combines the emotion recognition results of speech, electroencephalogram (EEG), and faces to comprehensively judge people’s emotions from multiple angles and effectively solve the problems of low accuracy and poor robustness of the model in the past research. For speech signals, a lightweight fully convolutional neural network is designed, which can learn the emotional characteristics of speech well and is overwhelming at the lightweight level. For EEG signals, a tree-structured LSTM model is proposed, which can comprehensively learn the emotional characteristics of each stage. For face signals, GhostNet is used for feature learning, and the structure of GhostNet is improved to greatly promote its performance. In addition, an optimal weight distribution algorithm is designed to search for the reliability of modal recognition results for decision-level fusion and thus more comprehensive and accurate results. The above methods can achieve the accuracy of 94.36% and 98.27% on EMO-DB and CK+ datasets, respectively, and the proposed fusion method can achieve the accuracy of 90.25% and 89.33% on the MAHNOB-HCI database regarding arousal and valence, respectively. The experimental results reveal that the multimodal emotion recognition method proposed in this study effectively improves the recognition accuracy compared with the single mode and the traditional fusion methods.

    参考文献
    相似文献
    引证文献
引用本文

方伟杰,张志航,王恒畅,梁艳,潘家辉.融合语音、脑电和人脸表情的多模态情绪识别.计算机系统应用,2023,32(1):337-347

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-06-01
  • 最后修改日期:2022-07-01
  • 录用日期:
  • 在线发布日期: 2022-08-24
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号