###
计算机系统应用英文版:2022,31(7):179-185
本文二维码信息
码上扫一扫!
多阶段生成器与时频鉴别器的GAN语音增强算法
(1.公安部第一研究所, 北京 100048;2.武汉大学 国家多媒体软件工程技术研究中心, 武汉 430072;3.华中师范大学 计算机学院, 武汉 430077)
GAN Speech Enhancement Algorithm with Multi-stage Generator and Time-frequency Discriminator
(1.Frist Research Institute of the Ministry of Public Security of PRC, Beijing 100048, China;2.National Engineering Research Center for Multimedia Software, Wuhan University, Wuhan 430072, China;3.School of Computer Science, Central China Normal University, Wuhan 430077, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 676次   下载 2039
Received:October 14, 2021    Revised:November 12, 2021
中文摘要: 传统生成对抗网络的语音增强算法(SEGAN)将时域语音波形作为映射目标, 在低信噪比条件下, 语音时域波形会淹没在噪声中, 导致SEGAN的增强性能会急剧下降, 语音失真现象较为严重. 针对该问题, 提出了一种多阶段的时频域生成对抗网络的语音增强算法(multi-stage-time-frequency SEGAN, MS-TFSEGAN). MS-TFSEGAN采用了多阶段生成器与时频域双鉴别器的模型结构, 不断对映射结果进行完善, 同时捕获时域与频域信息. 另外, 为了进一步提升模型对频域细节信息的学习能力, MS-TFSEGAN在生成器损失函数中引入了频域L1损失. 实验证明, 在低信噪比条件下, MS-TFSEGAN的语音质量和可懂度与SEGAN相比分别提升了约13.32%和8.97%, 作为语音识别前端时在CER上实现了7.3%的相对提升.
Abstract:The traditional speech enhancement generative adversarial network (SEGAN) takes the waveform of time-domain speech as the mapping target. When it comes to a low signal-to-noise ratio, the waveform of time-domain speech is drowned in the noise, resulting in a dramatic degradation of the enhancement performance of SEGAN and more serious speech distortion. In response, a multi-stage-time-frequency SEGAN (MS-TFSEGAN) is proposed for speech enhancement. MS-TFSEGAN employs multi-stage generators with dual time-frequency discriminators to continuously refine the mapping results. It captures both time- and frequency-domain information at the same time. In addition, for the further enhancement of learning ability in the frequency domain, MS-TFSEGAN introduces L1 loss in the generator loss function. Experimental results show that the speech quality and intelligibility of MS-TFSEGAN are improved by about 13.32% and 8.97%, respectively, compared with SEGAN under low SNR. A relative improvement of 7.3% in CER is achieved when MS-TFSEGAN is used as the front-end of speech recognition.
文章编号:     中图分类号:    文献标志码:
基金项目:
引用文本:
陈宇,尹文兵,高戈,王霄,曾邦,陈怡.多阶段生成器与时频鉴别器的GAN语音增强算法.计算机系统应用,2022,31(7):179-185
CHEN Yu,YIN Wen-Bing,GAO Ge,WANG Xiao,ZENG Bang,CHEN Yi.GAN Speech Enhancement Algorithm with Multi-stage Generator and Time-frequency Discriminator.COMPUTER SYSTEMS APPLICATIONS,2022,31(7):179-185