本文已被:浏览 1071次 下载 4409次
Received:September 23, 2020 Revised:October 21, 2020
Received:September 23, 2020 Revised:October 21, 2020
中文摘要: 为了解决语音情感识别中数据集样本分布不平衡的问题, 提出一种结合数据平衡和注意力机制的卷积神经网络(CNN)和长短时记忆单元(LSTM)的语音情感识别方法. 该方法首先对语音情感数据集中的语音样本提取对数梅尔频谱图, 并根据样本分布特点对进行分段处理, 以便实现数据平衡处理, 通过在分段的梅尔频谱数据集中微调预训练好的CNN模型, 用于学习高层次的片段语音特征. 随后, 考虑到语音中不同片段区域在情感识别作用的差异性, 将学习到的分段CNN特征输入到带有注意力机制的LSTM中, 用于学习判别性特征, 并结合LSTM和Softmax层从而实现语音情感的分类. 在BAUM-1s和CHEAVD2.0数据集中的实验结果表明, 本文提出的语音情感识别方法能有效地提高语音情感识别性能.
Abstract:In order to solve the problem of unbalanced sample distribution in a dataset in Speech Emotion Recognition (SER), this study proposes a SER method combining a Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) units with data balance and an attention mechanism. This method first extracts the log-Mel spectrogram from the samples in a speech emotion dataset and devides the sample distribution into segments according to sample distribution for balance. Then, this method fine-tunes the pre-trained CNN model in the segmented Mel-spectrum dataset to learn high-level speech segments. Next, given the differences in the emotion recognition of different segments in speech, the learned segmented CNN features are input into the LSTM with an attention mechanism for learning discriminative features, and speech emotions are classified with LSTM and Softmax layers. The experimental results in the BAUM-1s and CHEAVD2.0 datasets show that the method proposed in this study has much better performance than conventional methods.
keywords: Convolutional Neural Network (CNN) Long Short-Term Memory (LSTM) unit attention mechanism speech emotion recognition
文章编号: 中图分类号: 文献标志码:
基金项目:国家自然科学基金 (61976149); 浙江省自然科学基金 (LZ20F020002)
引用文本:
陈港,张石清,赵小明.结合数据平衡和注意力机制的CNN+LSTM的自然语音情感识别.计算机系统应用,2021,30(5):269-275
CHEN Gang,ZHANG Shi-Qing,ZHAO Xiao-Ming.Natural Speech Emotion Recognition by Integrating Data Balance and Attention Mechanism Based on CNN+LSTM.COMPUTER SYSTEMS APPLICATIONS,2021,30(5):269-275
陈港,张石清,赵小明.结合数据平衡和注意力机制的CNN+LSTM的自然语音情感识别.计算机系统应用,2021,30(5):269-275
CHEN Gang,ZHANG Shi-Qing,ZHAO Xiao-Ming.Natural Speech Emotion Recognition by Integrating Data Balance and Attention Mechanism Based on CNN+LSTM.COMPUTER SYSTEMS APPLICATIONS,2021,30(5):269-275