融合语音、脑电和人脸表情的多模态情绪识别
作者:
基金项目:

科技创新2030“脑科学与类脑研究”重点项目(2022ZD0208900); 国家自然科学基金面上项目(62076103)


Multimodal Emotion Recognition Based on Speech, EEG and Facial Expression
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [27]
  • |
  • 相似文献
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    本文提出了一种多模态情绪识别方法, 该方法融合语音、脑电及人脸的情绪识别结果来从多个角度综合判断人的情绪, 有效地解决了过去研究中准确率低、模型鲁棒性差的问题. 对于语音信号, 本文设计了一个轻量级全卷积神经网络, 该网络能够很好地学习语音情绪特征且在轻量级方面拥有绝对的优势. 对于脑电信号, 本文提出了一个树状LSTM模型, 可以全面学习每个阶段的情绪特征. 对于人脸信号, 本文使用GhostNet进行特征学习, 并改进了GhostNet的结构使其性能大幅提升. 此外, 我们设计了一个最优权重分布算法来搜寻各模态识别结果的可信度来进行决策级融合, 从而得到更全面、更准确的结果. 上述方法在EMO-DB与CK+数据集上分别达到了94.36%与98.27%的准确率, 且提出的融合方法在MAHNOB-HCI数据库的唤醒效价两个维度上分别得到了90.25%与89.33%的准确率. 我们的实验结果表明, 与使用单一模态以及传统的融合方式进行情绪识别相比, 本文提出的多模态情绪识别方法有效地提高了识别准确率.

    Abstract:

    In this study, a multimodal emotion recognition method is proposed, which combines the emotion recognition results of speech, electroencephalogram (EEG), and faces to comprehensively judge people’s emotions from multiple angles and effectively solve the problems of low accuracy and poor robustness of the model in the past research. For speech signals, a lightweight fully convolutional neural network is designed, which can learn the emotional characteristics of speech well and is overwhelming at the lightweight level. For EEG signals, a tree-structured LSTM model is proposed, which can comprehensively learn the emotional characteristics of each stage. For face signals, GhostNet is used for feature learning, and the structure of GhostNet is improved to greatly promote its performance. In addition, an optimal weight distribution algorithm is designed to search for the reliability of modal recognition results for decision-level fusion and thus more comprehensive and accurate results. The above methods can achieve the accuracy of 94.36% and 98.27% on EMO-DB and CK+ datasets, respectively, and the proposed fusion method can achieve the accuracy of 90.25% and 89.33% on the MAHNOB-HCI database regarding arousal and valence, respectively. The experimental results reveal that the multimodal emotion recognition method proposed in this study effectively improves the recognition accuracy compared with the single mode and the traditional fusion methods.

    参考文献
    [1] 潘家辉, 何志鹏, 李自娜, 等. 多模态情绪识别研究综述. 智能系统学报, 2020, 15(4):633-645.[doi:10.11992/tis.202001032
    [2] Russell JA. Affective space is bipolar. Journal of Personality and Social Psychology, 1979, 37(3):345-356.[doi:10.1037/0022-3514.37.3.345
    [3] Mehrabian A. Communication without words. Communication Theory. Routledge, 2017:193-200
    [4] Mishra S, Joshi B, Paudyal R, et al. Deep residual learning for facial emotion recognition. In:Shakya S, Bestak R, Palanisamy R, et al., eds. Mobile Computing and Sustainable Informatics. Singapore:Springer, 2022. 301-313.
    [5] Nasri MA, Hmani MA, Mtibaa A, et al. Face emotion recognition from static image based on convolution neural networks. Proceedings of the 2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP). Sousse:IEEE, 2020. 1-6.
    [6] Chowdary MK, Nguyen TN, Hemanth DJ. Deep learning-based facial emotion recognition for human-computer interaction applications. Neural Computing and Applications, 2021:1-18.[doi:10.1007/s00521-021-06012-8
    [7] Han K, Wang Y, Tian Q, et al. GhostNet:More features from cheap operations. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020:1580-1589.
    [8] Chen MY, He XJ, Yang J, et al. 3-D convolutional recurrent neural networks with attention model for speech emotion recognition. IEEE Signal Processing Letters, 2018, 25(10):1440-1444.[doi:10.1109/lsp.2018.2860246
    [9] Zhang H, Gou RY, Shang JL, et al. Pre-trained deep convolution neural network model with attention for speech emotion recognition. Frontiers in Physiology, 2021, 12:643202.[doi:10.3389/fphys.2021.643202
    [10] Alhagry S, Fahmy AA, El-Khoribi RA. Emotion recognition based on EEG using LSTM recurrent neural network. Emotion, 2017, 8(10):355-358.[doi:10.14569/ijacsa.2017.081046
    [11] Wu X, Zheng WL, Li ZY, et al. Investigating EEG-based functional connectivity patterns for multimodal emotion recognition. Journal of Neural Engineering, 2022, 19(1):016012.[doi:10.1088/1741-2552/ac49a7
    [12] Li JC, Li SQ, Pan JH, et al. Cross-subject EEG emotion recognition with self-organized graph neural network. Frontiers in Neuroscience, 2021, 689:611653.[doi:10.3389/fnins.2021.611653
    [13] Zhou HS, Du J, Zhang YY, et al. Information fusion in attention networks using adaptive and multi-level factorized bilinear pooling for audio-visual emotion recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29:2617-2629.[doi:10.1109/taslp.2021.3096037
    [14] Ma YX, Hao YX, Chen M, et al. Audio-visual emotion fusion (AVEF):A deep efficient weighted approach. Information Fusion, 2019, 46:184-192.[doi:10.1016/j.inffus.2018.06.003
    [15] Li RX, Liang Y, Liu XJ, et al. MindLink-Eumpy:An open-source Python toolbox for multimodal emotion recognition. Frontiers in Human Neuroscience, 2021, 15:621493.[doi:10.3389/fnhum.2021.621493
    [16] Wang M, Huang ZY, Li YC, et al. Maximum weight multi-modal information fusion algorithm of electroencephalographs and face images for emotion recognition. Computers & Electrical Engineering, 2021, 94:107319.[doi:10.1016/j.compeleceng.2021.107319
    [17] Muppidi A, Radfar M. Speech emotion recognition using quaternion convolutional neural networks. Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Toronto:IEEE, 2021. 6309-6313.
    [18] Bhatnagar G, Wu QMJ, Raman B. A new fractional random wavelet transform for fingerprint security. IEEE Transactions on Systems, Man, and Cybernetics-Part A:Systems and Humans, 2012, 42(1):262-275.[doi:10.1109/tsmca.2011.2147307
    [19] Jenke R, Peer A, Buss M. Feature extraction and selection for emotion recognition from EEG. IEEE Transactions on Affective Computing, 2014, 5(3):327-339.[doi:10.1109/taffc.2014.2339834
    [20] Huang HY, Xie QY, Pan JH, et al. An EEG-based brain computer interface for emotion recognition and its application in patients with disorder of consciousness. IEEE Transactions on Affective Computing, 2021, 12(4):832-842.[doi:10.1109/TAFFC.2019.2901456
    [21] Paoletti ME, Haut JM, Pereira NS, et al. GhostNet for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 2021, 59(12):10378-10393.[doi:10.1109/tgrs.2021.3050257
    [22] Huang XH, Kortelainen J, Zhao GY, et al. Multi-modal emotion analysis from facial expressions and electroencephalogram. Computer Vision and Image Understanding, 2016, 147:114-124.[doi:10.1016/j.cviu.2015.09.015
    [23] Mustaqeem, Sajjad M, Kwon S. Clustering-based speech emotion recognition by incorporating learned features and deep BiLSTM. IEEE Access, 2020, 8:79861-79875.[doi:10.1109/access.2020.2990405
    [24] Mustaqeem, Kwon S. Att-Net:Enhanced emotion recognition system using lightweight self-attention module. Applied Soft Computing, 2021, 102:107101.[doi:10.1016/j.asoc.2021.107101
    [25] Andayani F, Theng LB, Tsun MT, et al. Hybrid LSTM-transformer model for emotion recognition from speech audio files. IEEE Access, 2022, 10:36018-36027.[doi:10.1109/ACCESS.2022.3163856
    [26] Priya RNB, Hanmandlu M, Vasikarla S. Emotion recognition using deep learning. Proceedings of the 2021 IEEE Applied Imagery Pattern Recognition Workshop (AIPR). Washington:IEEE, 2021. 1-5.
    [27] Li DH, Yang ZY, Hou FZ, et al. EEG-based emotion recognition with haptic vibration by a feature fusion method. IEEE Transactions on Instrumentation and Measurement, 2022, 71:2504111.[doi:10.1109/tim.2022.3147882
    相似文献
    引证文献
引用本文

方伟杰,张志航,王恒畅,梁艳,潘家辉.融合语音、脑电和人脸表情的多模态情绪识别.计算机系统应用,2023,32(1):337-347

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-06-01
  • 最后修改日期:2022-07-01
  • 在线发布日期: 2022-08-24
文章二维码
您是第11201867位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号