###
计算机系统应用英文版:2023,32(1):337-347
←前一篇   |   后一篇→
本文二维码信息
码上扫一扫!
融合语音、脑电和人脸表情的多模态情绪识别
(华南师范大学 软件学院, 佛山 528225)
Multimodal Emotion Recognition Based on Speech, EEG and Facial Expression
(School of Software, South China Normal University, Foshan 528225, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 886次   下载 3009
Received:June 01, 2022    Revised:July 01, 2022
中文摘要: 本文提出了一种多模态情绪识别方法, 该方法融合语音、脑电及人脸的情绪识别结果来从多个角度综合判断人的情绪, 有效地解决了过去研究中准确率低、模型鲁棒性差的问题. 对于语音信号, 本文设计了一个轻量级全卷积神经网络, 该网络能够很好地学习语音情绪特征且在轻量级方面拥有绝对的优势. 对于脑电信号, 本文提出了一个树状LSTM模型, 可以全面学习每个阶段的情绪特征. 对于人脸信号, 本文使用GhostNet进行特征学习, 并改进了GhostNet的结构使其性能大幅提升. 此外, 我们设计了一个最优权重分布算法来搜寻各模态识别结果的可信度来进行决策级融合, 从而得到更全面、更准确的结果. 上述方法在EMO-DB与CK+数据集上分别达到了94.36%与98.27%的准确率, 且提出的融合方法在MAHNOB-HCI数据库的唤醒效价两个维度上分别得到了90.25%与89.33%的准确率. 我们的实验结果表明, 与使用单一模态以及传统的融合方式进行情绪识别相比, 本文提出的多模态情绪识别方法有效地提高了识别准确率.
Abstract:In this study, a multimodal emotion recognition method is proposed, which combines the emotion recognition results of speech, electroencephalogram (EEG), and faces to comprehensively judge people’s emotions from multiple angles and effectively solve the problems of low accuracy and poor robustness of the model in the past research. For speech signals, a lightweight fully convolutional neural network is designed, which can learn the emotional characteristics of speech well and is overwhelming at the lightweight level. For EEG signals, a tree-structured LSTM model is proposed, which can comprehensively learn the emotional characteristics of each stage. For face signals, GhostNet is used for feature learning, and the structure of GhostNet is improved to greatly promote its performance. In addition, an optimal weight distribution algorithm is designed to search for the reliability of modal recognition results for decision-level fusion and thus more comprehensive and accurate results. The above methods can achieve the accuracy of 94.36% and 98.27% on EMO-DB and CK+ datasets, respectively, and the proposed fusion method can achieve the accuracy of 90.25% and 89.33% on the MAHNOB-HCI database regarding arousal and valence, respectively. The experimental results reveal that the multimodal emotion recognition method proposed in this study effectively improves the recognition accuracy compared with the single mode and the traditional fusion methods.
文章编号:     中图分类号:    文献标志码:
基金项目:科技创新2030“脑科学与类脑研究”重点项目(2022ZD0208900); 国家自然科学基金面上项目(62076103)
引用文本:
方伟杰,张志航,王恒畅,梁艳,潘家辉.融合语音、脑电和人脸表情的多模态情绪识别.计算机系统应用,2023,32(1):337-347
FANG Wei-Jie,ZHANG Zhi-Hang,WANG Heng-Chang,LIANG Yan,PAN Jia-Hui.Multimodal Emotion Recognition Based on Speech, EEG and Facial Expression.COMPUTER SYSTEMS APPLICATIONS,2023,32(1):337-347