结合数据平衡和注意力机制的CNN+LSTM的自然语音情感识别

doi:10.15888/j.cnki.csa.007917

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月24日 2:52 星期四

首页 > 过刊浏览>2021年第30卷第5期 >269-275. DOI:10.15888/j.cnki.csa.007917

PDF HTML阅读 XML下载导出引用引用提醒

结合数据平衡和注意力机制的CNN+LSTM的自然语音情感识别
DOI:
                        10.15888/j.cnki.csa.007917
                    
CSTR:
                        
                    
作者:
                        陈港陈港
浙江理工大学 机械与自动控制学院, 杭州 310018
在期刊界中查找
在百度中查找
在本站中查找
张石清张石清
台州学院 智能信息处理研究所, 台州 318000
在期刊界中查找
在百度中查找
在本站中查找
赵小明赵小明
浙江理工大学 机械与自动控制学院, 杭州 310018;台州学院 智能信息处理研究所, 台州 318000
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金 (61976149); 浙江省自然科学基金 (LZ20F020002)

Natural Speech Emotion Recognition by Integrating Data Balance and Attention Mechanism Based on CNN+LSTM

Author:

CHEN Gang
CHEN Gang
Faculty of Mechanical Engineering & Automation, Zhejiang Sci-Tech University, Hangzhou 310018, China
在期刊界中查找
在百度中查找
在本站中查找
ZHANG Shi-Qing
ZHANG Shi-Qing
Institute of Intelligent Information Processing, Taizhou University, Taizhou 318000, China
在期刊界中查找
在百度中查找
在本站中查找
ZHAO Xiao-Ming
ZHAO Xiao-Ming
Faculty of Mechanical Engineering & Automation, Zhejiang Sci-Tech University, Hangzhou 310018, China;Institute of Intelligent Information Processing, Taizhou University, Taizhou 318000, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

为了解决语音情感识别中数据集样本分布不平衡的问题, 提出一种结合数据平衡和注意力机制的卷积神经网络(CNN)和长短时记忆单元(LSTM)的语音情感识别方法. 该方法首先对语音情感数据集中的语音样本提取对数梅尔频谱图, 并根据样本分布特点对进行分段处理, 以便实现数据平衡处理, 通过在分段的梅尔频谱数据集中微调预训练好的CNN模型, 用于学习高层次的片段语音特征. 随后, 考虑到语音中不同片段区域在情感识别作用的差异性, 将学习到的分段CNN特征输入到带有注意力机制的LSTM中, 用于学习判别性特征, 并结合LSTM和Softmax层从而实现语音情感的分类. 在BAUM-1s和CHEAVD2.0数据集中的实验结果表明, 本文提出的语音情感识别方法能有效地提高语音情感识别性能.

关键词:卷积神经网络;长短时记忆单元;注意力机制;语音情感识别

Abstract:

In order to solve the problem of unbalanced sample distribution in a dataset in Speech Emotion Recognition (SER), this study proposes a SER method combining a Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) units with data balance and an attention mechanism. This method first extracts the log-Mel spectrogram from the samples in a speech emotion dataset and devides the sample distribution into segments according to sample distribution for balance. Then, this method fine-tunes the pre-trained CNN model in the segmented Mel-spectrum dataset to learn high-level speech segments. Next, given the differences in the emotion recognition of different segments in speech, the learned segmented CNN features are input into the LSTM with an attention mechanism for learning discriminative features, and speech emotions are classified with LSTM and Softmax layers. The experimental results in the BAUM-1s and CHEAVD2.0 datasets show that the method proposed in this study has much better performance than conventional methods.

Key words:Convolutional Neural Network (CNN);Long Short-Term Memory (LSTM) unit;attention mechanism;speech emotion recognition

引用本文

陈港,张石清,赵小明.结合数据平衡和注意力机制的CNN+LSTM的自然语音情感识别.计算机系统应用,2021,30(5):269-275

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2020-09-23
最后修改日期:2020-10-21
录用日期:
在线发布日期: 2021-05-06
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码