Natural Speech Emotion Recognition by Integrating Data Balance and Attention Mechanism Based on CNN+LSTM

doi:10.15888/j.cnki.csa.007917

AIPUB归智期刊联盟

WeChat

Mobile website

2025-4-25- 11

Home > Archive>Volume 30, Issue 5, 2021 >269-275. DOI:10.15888/j.cnki.csa.007917

PDF HTML XML Export Cite reminder

Natural Speech Emotion Recognition by Integrating Data Balance and Attention Mechanism Based on CNN+LSTM
DOI:
                        10.15888/j.cnki.csa.007917
                    
CSTR:
                        [cstr]
                    
Author:
                        CHEN GangCHEN Gang
Faculty of Mechanical Engineering & Automation, Zhejiang Sci-Tech University, Hangzhou 310018, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
ZHANG Shi-QingZHANG Shi-Qing
Institute of Intelligent Information Processing, Taizhou University, Taizhou 318000, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
ZHAO Xiao-MingZHAO Xiao-Ming
Faculty of Mechanical Engineering & Automation, Zhejiang Sci-Tech University, Hangzhou 310018, China;Institute of Intelligent Information Processing, Taizhou University, Taizhou 318000, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

In order to solve the problem of unbalanced sample distribution in a dataset in Speech Emotion Recognition (SER), this study proposes a SER method combining a Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) units with data balance and an attention mechanism. This method first extracts the log-Mel spectrogram from the samples in a speech emotion dataset and devides the sample distribution into segments according to sample distribution for balance. Then, this method fine-tunes the pre-trained CNN model in the segmented Mel-spectrum dataset to learn high-level speech segments. Next, given the differences in the emotion recognition of different segments in speech, the learned segmented CNN features are input into the LSTM with an attention mechanism for learning discriminative features, and speech emotions are classified with LSTM and Softmax layers. The experimental results in the BAUM-1s and CHEAVD2.0 datasets show that the method proposed in this study has much better performance than conventional methods.

Key words:Convolutional Neural Network (CNN);Long Short-Term Memory (LSTM) unit;attention mechanism;speech emotion recognition

Get Citation

陈港,张石清,赵小明.结合数据平衡和注意力机制的CNN+LSTM的自然语音情感识别.计算机系统应用,2021,30(5):269-275

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:September 23,2020
Revised:October 21,2020
Adopted:
Online: May 06,2021
Published:

Article QR Code

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address：4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code：100190
Phone：010-62661041 Fax： Email：csa (a) iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063