多阶段生成器与时频鉴别器的GAN语音增强算法

doi:10.15888/j.cnki.csa.008587

微信公众号

网站二维码

首页 > 过刊浏览>2022年第31卷第7期 >179-185. DOI:10.15888/j.cnki.csa.008587

PDF HTML阅读 XML下载导出引用引用提醒

多阶段生成器与时频鉴别器的GAN语音增强算法
DOI:
                        10.15888/j.cnki.csa.008587
                    
CSTR:
                        
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:

GAN Speech Enhancement Algorithm with Multi-stage Generator and Time-frequency Discriminator

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

传统生成对抗网络的语音增强算法(SEGAN)将时域语音波形作为映射目标, 在低信噪比条件下, 语音时域波形会淹没在噪声中, 导致SEGAN的增强性能会急剧下降, 语音失真现象较为严重. 针对该问题, 提出了一种多阶段的时频域生成对抗网络的语音增强算法(multi-stage-time-frequency SEGAN, MS-TFSEGAN). MS-TFSEGAN采用了多阶段生成器与时频域双鉴别器的模型结构, 不断对映射结果进行完善, 同时捕获时域与频域信息. 另外, 为了进一步提升模型对频域细节信息的学习能力, MS-TFSEGAN在生成器损失函数中引入了频域L1损失. 实验证明, 在低信噪比条件下, MS-TFSEGAN的语音质量和可懂度与SEGAN相比分别提升了约13.32%和8.97%, 作为语音识别前端时在CER上实现了7.3%的相对提升.

Abstract:

The traditional speech enhancement generative adversarial network (SEGAN) takes the waveform of time-domain speech as the mapping target. When it comes to a low signal-to-noise ratio, the waveform of time-domain speech is drowned in the noise, resulting in a dramatic degradation of the enhancement performance of SEGAN and more serious speech distortion. In response, a multi-stage-time-frequency SEGAN (MS-TFSEGAN) is proposed for speech enhancement. MS-TFSEGAN employs multi-stage generators with dual time-frequency discriminators to continuously refine the mapping results. It captures both time- and frequency-domain information at the same time. In addition, for the further enhancement of learning ability in the frequency domain, MS-TFSEGAN introduces L1 loss in the generator loss function. Experimental results show that the speech quality and intelligibility of MS-TFSEGAN are improved by about 13.32% and 8.97%, respectively, compared with SEGAN under low SNR. A relative improvement of 7.3% in CER is achieved when MS-TFSEGAN is used as the front-end of speech recognition.

参考文献

相似文献

引证文献

引用本文

陈宇,尹文兵,高戈,王霄,曾邦,陈怡.多阶段生成器与时频鉴别器的GAN语音增强算法.计算机系统应用,2022,31(7):179-185

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2021-10-14
最后修改日期:2021-11-12
录用日期:
在线发布日期: 2022-05-31
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码