###

计算机系统应用英文版:2021,30(12):218-225

View/Add Comment 过刊浏览高级检索 HTML

←前一篇 | 后一篇→

码上扫一扫！

下载全文

基于视听觉感知系统的多模态情感识别

龙英潮, 丁美荣, 林桂锦, 刘鸿业, 曾碧卿

(华南师范大学软件学院, 佛山 528225)

Emotion Recognition Based on Visual and Audiovisual Perception System

LONG Ying-Chao, DING Mei-Rong, LIN Gui-Jin, LIU Hong-Ye, ZENG Bi-Qing

(School of Software, South China Normal University, Foshan 528225, China)

摘要

图/表

参考文献

相似文献

本文已被：浏览 908次下载 1713次
Received:March 05, 2021 Revised:April 07, 2021

中文摘要: 情绪识别作为人机交互的热门领域, 其技术已经被应用于医学、教育、安全驾驶、电子商务等领域.情绪主要由面部表情、声音、话语等进行表达, 不同情绪表达时的面部肌肉、语气、语调等特征也不相同, 使用单一模态特征确定的情绪的不准确性偏高, 考虑到情绪表达主要通过视觉和听觉进行感知, 本文提出了一种基于视听觉感知系统的多模态表情识别算法, 分别从语音和图像模态出发, 提取两种模态的情感特征, 并设计多个分类器为单特征进行情绪分类实验, 得到多个基于单特征的表情识别模型. 在语音和图像的多模态实验中, 提出了晚期融合策略进行特征融合, 考虑到不同模型间的弱依赖性, 采用加权投票法进行模型融合, 得到基于多个单特征模型的融合表情识别模型. 本文使用AFEW数据集进行实验, 通过对比融合表情识别模型与单特征的表情识别模型的识别结果, 验证了基于视听觉感知系统的多模态情感识别效果要优于基于单模态的识别效果.

中文关键词: 情感识别模型融合多模态视听觉感知系统

Abstract:As a hot spot of human-computer interaction, emotion recognition has been applied in many fields, such as medicine, education, safe driving and e-commerce. Emotions are mainly expressed by facial expression, voice, discourse and so on. Other characteristics such as facial muscles, mood and intonation vary when different kinds of emotions are expressed. Thus, the inaccuracy of emotions determined using a single modal feature is high. Considering that the expressed emotions are mainly perceived by vision and hearing, this study proposes a multimodal expression recognition algorithm based on an audiovisual perception system. Specifically, the emotion features of speech and image modalities are first extracted, and a plurality of classifiers are designed to perform emotion classification experiments for a single feature, from which multiple expression recognition models based on single features are obtained. In the multimodal experiments of speech and images, a late fusion strategy is put forward for feature fusion. Taking into account the weak dependence of different models, this work uses the weighted voting method for model fusion and obtains the integrated expression recognition model based on multiple single-feature models. The AFEW dataset is adopted for facial expression recognition in this study. The comparison of recognition results between the integrated model and the single-feature models for expression recognition verifies that the effect of multimodal emotion recognition based on the audiovisual perception system is better than that of single-modal emotion recognition.

keywords: emotion recognition model fusion multimodal audiovisual perception system

文章编号： 中图分类号： 文献标志码：

基金项目:国家自然科学基金(61876067); 广东省普通高校人工智能重点领域专项(2019KZDZX1033); 广东省信息物理融合系统重点实验室建设专项(2020B1212060069)

引用文本：
龙英潮,丁美荣,林桂锦,刘鸿业,曾碧卿.基于视听觉感知系统的多模态情感识别.计算机系统应用,2021,30(12):218-225
LONG Ying-Chao,DING Mei-Rong,LIN Gui-Jin,LIU Hong-Ye,ZENG Bi-Qing.Emotion Recognition Based on Visual and Audiovisual Perception System.COMPUTER SYSTEMS APPLICATIONS,2021,30(12):218-225

Author Name	Affiliation	E-mail
LONG Ying-Chao	School of Software, South China Normal University, Foshan 528225, China
DING Mei-Rong	School of Software, South China Normal University, Foshan 528225, China	362034935@qq.com
LIN Gui-Jin	School of Software, South China Normal University, Foshan 528225, China
LIU Hong-Ye	School of Software, South China Normal University, Foshan 528225, China
ZENG Bi-Qing	School of Software, South China Normal University, Foshan 528225, China