融合语音、脑电和人脸表情的多模态情绪识别

doi:10.15888/j.cnki.csa.008907

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月4日 18:00 星期五

首页 > 过刊浏览>2023年第32卷第1期 >337-347. DOI:10.15888/j.cnki.csa.008907

PDF HTML阅读 XML下载导出引用引用提醒

融合语音、脑电和人脸表情的多模态情绪识别
DOI:
                        10.15888/j.cnki.csa.008907
                    
CSTR:
                        
                    
作者:
                        方伟杰方伟杰
华南师范大学 软件学院, 佛山 528225
在期刊界中查找
在百度中查找
在本站中查找
张志航张志航
华南师范大学 软件学院, 佛山 528225
在期刊界中查找
在百度中查找
在本站中查找
王恒畅王恒畅
华南师范大学 软件学院, 佛山 528225
在期刊界中查找
在百度中查找
在本站中查找
梁艳梁艳
华南师范大学 软件学院, 佛山 528225
在期刊界中查找
在百度中查找
在本站中查找
潘家辉潘家辉
华南师范大学 软件学院, 佛山 528225
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:科技创新2030“脑科学与类脑研究”重点项目(2022ZD0208900); 国家自然科学基金面上项目(62076103)

Multimodal Emotion Recognition Based on Speech, EEG and Facial Expression

Author:

FANG Wei-Jie
FANG Wei-Jie
School of Software, South China Normal University, Foshan 528225, China
在期刊界中查找
在百度中查找
在本站中查找
ZHANG Zhi-Hang
ZHANG Zhi-Hang
School of Software, South China Normal University, Foshan 528225, China
在期刊界中查找
在百度中查找
在本站中查找
WANG Heng-Chang
WANG Heng-Chang
School of Software, South China Normal University, Foshan 528225, China
在期刊界中查找
在百度中查找
在本站中查找
LIANG Yan
LIANG Yan
School of Software, South China Normal University, Foshan 528225, China
在期刊界中查找
在百度中查找
在本站中查找
PAN Jia-Hui
PAN Jia-Hui
School of Software, South China Normal University, Foshan 528225, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

本文提出了一种多模态情绪识别方法, 该方法融合语音、脑电及人脸的情绪识别结果来从多个角度综合判断人的情绪, 有效地解决了过去研究中准确率低、模型鲁棒性差的问题. 对于语音信号, 本文设计了一个轻量级全卷积神经网络, 该网络能够很好地学习语音情绪特征且在轻量级方面拥有绝对的优势. 对于脑电信号, 本文提出了一个树状LSTM模型, 可以全面学习每个阶段的情绪特征. 对于人脸信号, 本文使用GhostNet进行特征学习, 并改进了GhostNet的结构使其性能大幅提升. 此外, 我们设计了一个最优权重分布算法来搜寻各模态识别结果的可信度来进行决策级融合, 从而得到更全面、更准确的结果. 上述方法在EMO-DB与CK+数据集上分别达到了94.36%与98.27%的准确率, 且提出的融合方法在MAHNOB-HCI数据库的唤醒效价两个维度上分别得到了90.25%与89.33%的准确率. 我们的实验结果表明, 与使用单一模态以及传统的融合方式进行情绪识别相比, 本文提出的多模态情绪识别方法有效地提高了识别准确率.

关键词:多模态情绪识别;决策级融合;轻量级模型;LSTM;GhostNet;深度学习

Abstract:

In this study, a multimodal emotion recognition method is proposed, which combines the emotion recognition results of speech, electroencephalogram (EEG), and faces to comprehensively judge people’s emotions from multiple angles and effectively solve the problems of low accuracy and poor robustness of the model in the past research. For speech signals, a lightweight fully convolutional neural network is designed, which can learn the emotional characteristics of speech well and is overwhelming at the lightweight level. For EEG signals, a tree-structured LSTM model is proposed, which can comprehensively learn the emotional characteristics of each stage. For face signals, GhostNet is used for feature learning, and the structure of GhostNet is improved to greatly promote its performance. In addition, an optimal weight distribution algorithm is designed to search for the reliability of modal recognition results for decision-level fusion and thus more comprehensive and accurate results. The above methods can achieve the accuracy of 94.36% and 98.27% on EMO-DB and CK+ datasets, respectively, and the proposed fusion method can achieve the accuracy of 90.25% and 89.33% on the MAHNOB-HCI database regarding arousal and valence, respectively. The experimental results reveal that the multimodal emotion recognition method proposed in this study effectively improves the recognition accuracy compared with the single mode and the traditional fusion methods.

Key words:multimodal emotion recognition;decision-level fusion;lightweight model;LSTM;GhostNet;deep learning

引用本文

方伟杰,张志航,王恒畅,梁艳,潘家辉.融合语音、脑电和人脸表情的多模态情绪识别.计算机系统应用,2023,32(1):337-347

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2022-06-01
最后修改日期:2022-07-01
录用日期:
在线发布日期: 2022-08-24
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码