基于视听觉感知系统的多模态情感识别
作者:
基金项目:

国家自然科学基金(61876067); 广东省普通高校人工智能重点领域专项(2019KZDZX1033); 广东省信息物理融合系统重点实验室建设专项(2020B1212060069)


Emotion Recognition Based on Visual and Audiovisual Perception System
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [22]
  • |
  • 相似文献
  • | | |
  • 文章评论
    摘要:

    情绪识别作为人机交互的热门领域, 其技术已经被应用于医学、教育、安全驾驶、电子商务等领域.情绪主要由面部表情、声音、话语等进行表达, 不同情绪表达时的面部肌肉、语气、语调等特征也不相同, 使用单一模态特征确定的情绪的不准确性偏高, 考虑到情绪表达主要通过视觉和听觉进行感知, 本文提出了一种基于视听觉感知系统的多模态表情识别算法, 分别从语音和图像模态出发, 提取两种模态的情感特征, 并设计多个分类器为单特征进行情绪分类实验, 得到多个基于单特征的表情识别模型. 在语音和图像的多模态实验中, 提出了晚期融合策略进行特征融合, 考虑到不同模型间的弱依赖性, 采用加权投票法进行模型融合, 得到基于多个单特征模型的融合表情识别模型. 本文使用AFEW数据集进行实验, 通过对比融合表情识别模型与单特征的表情识别模型的识别结果, 验证了基于视听觉感知系统的多模态情感识别效果要优于基于单模态的识别效果.

    Abstract:

    As a hot spot of human-computer interaction, emotion recognition has been applied in many fields, such as medicine, education, safe driving and e-commerce. Emotions are mainly expressed by facial expression, voice, discourse and so on. Other characteristics such as facial muscles, mood and intonation vary when different kinds of emotions are expressed. Thus, the inaccuracy of emotions determined using a single modal feature is high. Considering that the expressed emotions are mainly perceived by vision and hearing, this study proposes a multimodal expression recognition algorithm based on an audiovisual perception system. Specifically, the emotion features of speech and image modalities are first extracted, and a plurality of classifiers are designed to perform emotion classification experiments for a single feature, from which multiple expression recognition models based on single features are obtained. In the multimodal experiments of speech and images, a late fusion strategy is put forward for feature fusion. Taking into account the weak dependence of different models, this work uses the weighted voting method for model fusion and obtains the integrated expression recognition model based on multiple single-feature models. The AFEW dataset is adopted for facial expression recognition in this study. The comparison of recognition results between the integrated model and the single-feature models for expression recognition verifies that the effect of multimodal emotion recognition based on the audiovisual perception system is better than that of single-modal emotion recognition.

    参考文献
    [1] 张会云. 语音情感识别研究综述. 信息通信, 2019, (11): 58–60. [doi: 10.3969/j.issn.1673-1131.2019.11.027
    [2] 潘家辉, 何志鹏, 李自娜, 等. 多模态情绪识别研究综述. 智能系统学报, 2020, 15(4): 633–645. [doi: 10.11992/tis.202001032
    [3] 曹鹏. 语音情感识别技术的研究与实现[硕士学位论文]. 镇江: 江苏大学, 2005.
    [4] 林记明. 体态语言的功能及其应用. 西安外国语学院学报, 2001, 9(4): 47–51. [doi: 10.3969/j.issn.1673-9876.2001.04.013
    [5] Soleymani M, Pantic M, Pun T. Multimodal emotion recognition in response to videos. IEEE Transactions on Affective Computing, 2012, 3(2): 211–223. [doi: 10.1109/T-AFFC.2011.37
    [6] 屠彬彬, 于凤芹. 基于样本熵与MFCC融合的语音情感识别. 计算机工程, 2012, 38(7): 142–144. [doi: 10.3969/j.issn.1000-3428.2012.07.047
    [7] 姚增伟, 刘炜煌, 王梓豪, 等. 基于卷积神经网络和长短时记忆神经网络的非特定人语音情感识别算法. 新型工业化, 2018, 8(2): 68–74. [doi: 10.19335/j.cnki.2095-6649.2018.2.009
    [8] 邹元彬, 乐思琦, 廖清霖, 等. 基于LBP和LPQ的面部表情识别. 信息技术与信息化, 2020, (9): 199–205. [doi: 10.3969/j.issn.1672-9528.2020.09.064
    [9] 陈津徽, 张元良, 尹泽睿. 基于改进的VGG19网络的面部表情识别. 电脑知识与技术, 2020, 16(29): 187–188. [doi: 10.14004/j.cnki.ckt.2020.3328
    [10] 朱晨岗. 基于视听觉感知系统的情感识别技术研究[硕士学位论文]. 天津: 天津理工大学, 2018.
    [11] 贺奇. 基于语音和图像的多模态情感识别研究[硕士学位论文]. 哈尔滨: 哈尔滨工业大学, 2017.
    [12] 袁亮. 基于深度学习的双模态情感识别[硕士学位论文]. 南京: 南京邮电大学, 2018.
    [13] 张钰莎, 蒋盛益. 基于MFCC特征提取和改进SVM的语音情感数据挖掘分类识别方法研究. 计算机应用与软件, 2020, 37(8): 160–165, 212. [doi: 10.3969/j.issn.1000-386x.2020.08.028
    [14] 郭卉, 姜囡, 任杰. 基于MFCC和GFCC混合特征的语音情感识别研究. 光电技术应用, 2019, 34(6): 34–39. [doi: 10.3969/j.issn.1673-1255.2019.06.008
    [15] 罗相林, 秦雪佩, 贾年. 基于MFCC及其一阶差分特征的语音情感识别研究. 现代计算机, 2019, (11): 20–24. [doi: 10.3969/j.issn.1007-1423.2019.11.004
    [16] Aytar Y, Vondrick C, Torralba A. Soundnet: Learning sound representations from unlabeled video. arXiv: 1610.09001, 2016.
    [17] 李宏菲, 李庆, 周莉. 基于多视觉描述子及音频特征的动态序列人脸表情识别. 电子学报, 2019, 47(8): 1643–1653. [doi: 10.3969/j.issn.0372-2112.2019.08.006
    [18] Dhall A, Goecke R, Lucey S, et al. Collecting large, richly annotated facial-expression databases from movies. IEEE MultiMedia, 2012, 19(3): 34–41. [doi: 10.1109/MMUL.2012.26
    [19] Dhall A, Kaur A, Goecke R, et al. EmotiW 2018: Audio-video, student engagement and group-level affect prediction. Proceedings of the 20th ACM International Conference on Multimodal Interaction. Boulder: ACM, 2018. 653–656.
    [20] Liu CH, Tang TH, Lv K, et al. Multi-feature based emotion recognition for video clips. Proceedings of the 20th ACM International Conference on Multimodal Interaction. Boulder: ACM, 2018. 630–634.
    [21] Fan YR, Lam JCK, Li VOK. Video-based emotion recognition using deeply-supervised neural networks. Proceedings of the 20th ACM International Conference on Multimodal Interaction. Boulder: ACM, 2018. 584–588.
    [22] Lu C, Zheng WM, Li CL, et al. Multiple spatio-temporal feature learning for video-based emotion recognition in the wild. Proceedings of the 20th ACM International Conference on Multimodal Interaction. Boulder: ACM, 2018. 646–652.
    相似文献
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

龙英潮,丁美荣,林桂锦,刘鸿业,曾碧卿.基于视听觉感知系统的多模态情感识别.计算机系统应用,2021,30(12):218-225

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2021-03-05
  • 最后修改日期:2021-04-07
  • 在线发布日期: 2021-12-10
文章二维码
您是第11292222位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号