深度复数轴向自注意力卷积循环网络的语音增强

doi:10.15888/j.cnki.csa.009458

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月24日 3:14 星期四

首页 > 过刊浏览>2024年第33卷第4期 >60-68. DOI:10.15888/j.cnki.csa.009458

PDF HTML阅读 XML下载导出引用引用提醒

深度复数轴向自注意力卷积循环网络的语音增强
DOI:
                        10.15888/j.cnki.csa.009458
                    
CSTR:
                        32024.14.csa.009458
                    
作者:
                        曹洁曹洁
兰州理工大学 计算机与通信学院, 兰州 730050;兰州城市学院 信息工程学院, 兰州 730020
在期刊界中查找
在百度中查找
在本站中查找
王乔王乔
兰州理工大学 计算机与通信学院, 兰州 730050
在期刊界中查找
在百度中查找
在本站中查找
梁浩鹏梁浩鹏
兰州理工大学 计算机与通信学院, 兰州 730050
在期刊界中查找
在百度中查找
在本站中查找
王宸章王宸章
兰州理工大学 计算机与通信学院, 兰州 730050
在期刊界中查找
在百度中查找
在本站中查找
李晓旭李晓旭
兰州理工大学 计算机与通信学院, 兰州 730050
在期刊界中查找
在百度中查找
在本站中查找
于泓于泓
鲁东大学 信息与电气工程学院, 烟台 264025
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:甘肃省重点研发计划(22YF7GA130)

Speech Enhancement Based on Deep Complex Axial Self-attention Convolutional RecurrentNetwork

Author:

CAO Jie
CAO Jie
School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China;School of Information Engineering, Lanzhou City University, Lanzhou 730020, China
在期刊界中查找
在百度中查找
在本站中查找
WANG Qiao
WANG Qiao
School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China
在期刊界中查找
在百度中查找
在本站中查找
LIANG Hao-Peng
LIANG Hao-Peng
School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China
在期刊界中查找
在百度中查找
在本站中查找
WANG Chen-Zhang
WANG Chen-Zhang
School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China
在期刊界中查找
在百度中查找
在本站中查找
LI Xiao-Xu
LI Xiao-Xu
School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China
在期刊界中查找
在百度中查找
在本站中查找
YU Hong
YU Hong
School of Information and Electrical Engineering, Ludong University, Yantai 264025, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

单通道语音增强任务中相位估计不准确会导致增强语音的质量较差, 针对这一问题, 提出了一种基于深度复数轴向自注意力卷积循环网络(deep complex axial self-attention convolutional recurrent network, DCACRN)的语音增强方法, 在复数域同时实现了语音幅度信息和相位信息的增强. 首先使用基于复数卷积网络的编码器从输入语音信号中提取复数表示的特征, 并引入卷积跳连模块用以将特征映射到高维空间进行特征融合, 加强信息间的交互和梯度的流动. 然后设计了基于轴向自注意力机制的编码器-解码器结构, 利用轴向自注意力机制来增强模型的时序建模能力和特征提取能力. 最后通过解码器实现对语音信号的重构, 同时利用混合损失函数优化网络模型, 提升增强语音信号的质量. 实验在公开数据集Valentini和DNS Challenge上进行, 结果表明所提方法相对于其他模型在客观语音质量评估(perceptual evaluation of speech quality, PESQ)和短时客观可懂度(short-time objective intelligibility, STOI)两项指标上均有提升, 在非混响数据集中, PESQ比DCTCRN (deep cosine transform convolutional recurrent network)提高了12.8%, 比DCCRN (deep complex convolutional recurrent network)提高了3.9%, 验证了该网络模型在语音增强任务中的有效性.

关键词:单通道语音增强;复数卷积循环网络;卷积跳连;轴向自注意力机制

Abstract:

Inaccurate phase estimation in single-channel speech enhancement tasks will cause poor quality of the enhanced speech. To this end, this study proposes a speech enhancement method based on a deep complex axial self-attention convolutional recurrent network (DCACRN), which enhances speech amplitude information and phase information in the complex domain simultaneously. Firstly, a complex convolutional network-based encoder is employed to extract complex features from the input speech signal, and a convolutional hopping module is introduced to map the features into a high-dimensional space for feature fusion, which enhances the information interaction and the gradient flow. Then an encoder-decoder structure based on the axial self-attention mechanism is designed to enhance the model’s timing modeling ability and feature extraction ability. Finally, the reconstruction of the speech signals is realized by the decoder, while the hybrid loss function is adopted to optimize the network model to improve the quality of enhanced speech signals. Meanwhile, the mixed loss function is utilized to optimize the network model and improve the quality of enhanced speech signals. The experiments are conducted on the public datasets Valentini and DNS Challenge, and the results show that the proposed method improves both the perceptual evaluation of speech quality (PESQ) and short-time objective intelligibility (STOI) metrics compared to other models. In the non-reverberant dataset, PESQ is improved by 12.8% over DCTCRN and 3.9% over DCCRN, which validates the effectiveness of the proposed model in speech enhancement tasks.

Key words:single-channel speech enhancement;complex convolutional recurrent network;convolution jump;axial self-attention mechanism

引用本文

曹洁,王乔,梁浩鹏,王宸章,李晓旭,于泓.深度复数轴向自注意力卷积循环网络的语音增强.计算机系统应用,2024,33(4):60-68

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2023-10-07
最后修改日期:2023-11-09
录用日期:
在线发布日期: 2024-01-18
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码