###
计算机系统应用英文版:2024,33(4):60-68
本文二维码信息
码上扫一扫!
深度复数轴向自注意力卷积循环网络的语音增强
(1.兰州理工大学 计算机与通信学院, 兰州 730050;2.兰州城市学院 信息工程学院, 兰州 730020;3.鲁东大学 信息与电气工程学院, 烟台 264025)
Speech Enhancement Based on Deep Complex Axial Self-attention Convolutional RecurrentNetwork
(1.School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China;2.School of Information Engineering, Lanzhou City University, Lanzhou 730020, China;3.School of Information and Electrical Engineering, Ludong University, Yantai 264025, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 404次   下载 1601
Received:October 07, 2023    Revised:November 09, 2023
中文摘要: 单通道语音增强任务中相位估计不准确会导致增强语音的质量较差, 针对这一问题, 提出了一种基于深度复数轴向自注意力卷积循环网络(deep complex axial self-attention convolutional recurrent network, DCACRN)的语音增强方法, 在复数域同时实现了语音幅度信息和相位信息的增强. 首先使用基于复数卷积网络的编码器从输入语音信号中提取复数表示的特征, 并引入卷积跳连模块用以将特征映射到高维空间进行特征融合, 加强信息间的交互和梯度的流动. 然后设计了基于轴向自注意力机制的编码器-解码器结构, 利用轴向自注意力机制来增强模型的时序建模能力和特征提取能力. 最后通过解码器实现对语音信号的重构, 同时利用混合损失函数优化网络模型, 提升增强语音信号的质量. 实验在公开数据集Valentini和DNS Challenge上进行, 结果表明所提方法相对于其他模型在客观语音质量评估(perceptual evaluation of speech quality, PESQ)和短时客观可懂度(short-time objective intelligibility, STOI)两项指标上均有提升, 在非混响数据集中, PESQ比DCTCRN (deep cosine transform convolutional recurrent network)提高了12.8%, 比DCCRN (deep complex convolutional recurrent network)提高了3.9%, 验证了该网络模型在语音增强任务中的有效性.
Abstract:Inaccurate phase estimation in single-channel speech enhancement tasks will cause poor quality of the enhanced speech. To this end, this study proposes a speech enhancement method based on a deep complex axial self-attention convolutional recurrent network (DCACRN), which enhances speech amplitude information and phase information in the complex domain simultaneously. Firstly, a complex convolutional network-based encoder is employed to extract complex features from the input speech signal, and a convolutional hopping module is introduced to map the features into a high-dimensional space for feature fusion, which enhances the information interaction and the gradient flow. Then an encoder-decoder structure based on the axial self-attention mechanism is designed to enhance the model’s timing modeling ability and feature extraction ability. Finally, the reconstruction of the speech signals is realized by the decoder, while the hybrid loss function is adopted to optimize the network model to improve the quality of enhanced speech signals. Meanwhile, the mixed loss function is utilized to optimize the network model and improve the quality of enhanced speech signals. The experiments are conducted on the public datasets Valentini and DNS Challenge, and the results show that the proposed method improves both the perceptual evaluation of speech quality (PESQ) and short-time objective intelligibility (STOI) metrics compared to other models. In the non-reverberant dataset, PESQ is improved by 12.8% over DCTCRN and 3.9% over DCCRN, which validates the effectiveness of the proposed model in speech enhancement tasks.
文章编号:     中图分类号:    文献标志码:
基金项目:甘肃省重点研发计划(22YF7GA130)
引用文本:
曹洁,王乔,梁浩鹏,王宸章,李晓旭,于泓.深度复数轴向自注意力卷积循环网络的语音增强.计算机系统应用,2024,33(4):60-68
CAO Jie,WANG Qiao,LIANG Hao-Peng,WANG Chen-Zhang,LI Xiao-Xu,YU Hong.Speech Enhancement Based on Deep Complex Axial Self-attention Convolutional RecurrentNetwork.COMPUTER SYSTEMS APPLICATIONS,2024,33(4):60-68