﻿ 基于联合最大后验概率的语音增强算法
 计算机系统应用  2018, Vol. 27 Issue (12): 163-168 PDF

1. 江南大学 机械工程学院, 无锡 214122;
2. 江苏省食品先进制造装备技术重点实验室, 无锡 214122

Speech Enhancement Based on Joint Maximum A Posteriori Probability
LI Wan-Ling, ZHANG Qiu-Ju
College of Mechanical Engineering, Jiangnan University, Wuxi 214122, China;
Jiangsu Key Laboratory of Advanced Manufacturing Equipment & Technology, Wuxi 214122, China
Foundation item: National Natural Science Foundation of China (51575236)
Abstract: In order to solve the defect of the traditional spectral subtraction algorithm, an improved spectral subtraction based on the joint maximum a posteriori probability is proposed. The traditional spectral subtraction was used to reconstruct the speech via obtaining difference of the amplitude between the noisy speech and noise and extracting the phase of the noisy speech. " Music noise” was produced by the method, and the effect of signal enhancement under low signal-to-noise ratio was not ideal because of inaccurate phase estimation. For this, the multiband spectral subtraction and phase estimation were introduced, and spectral subtraction was carried out in the subbands which were obtained by spectrum division. And it has worked well on reducing the influence of " music noise”. Meanwhile, the phase estimator based on the maximum a posteriori probability was constructed which was obtained by combining the amplitude function and thephase function of the signal and alternate iteration. The experimental results show that, compared with the traditional spectral subtraction, the proposed algorithm has performed better in terms of the quality perception and intelligibility of the enhanced speech at low signal to noise ratio.
Key words: speech enhancement     phase estimation     maximum posterior probability     speech intelligibility

1 引言

2 基于多频带的改进谱减法

 $x\left( n \right) = s\left( n \right) + d\left( n \right)$ (1)

 ${S_k} = {X_k} - {D_k} = A{\rm{exp}}\left( {j{\alpha _k}} \right) - C{\rm{exp}}\left( {j{\gamma _k}} \right)$ (2)

 ${\left| {{S_k}} \right|^2} = {\left| {{X_k}} \right|^2} - {\left| {{D_k}} \right|^2} - 2Re\left\{ {{S_k}D_k^*} \right\}$ (3)

 $\begin{array}{l}\left| {{{\hat S}_i}\left( {{\omega _k}} \right)} \right| = {\left( {{{\left| {{X_i}\left( {{\omega _k}} \right)} \right|}^2} - {\alpha _i}{\delta _i}{{\left| {{{\hat D}_i}\left( {{\omega _k}} \right)} \right|}^2}} \right)^{1/2}},\;\;{b_i} \le {\omega _k} \le {e_i}\end{array}$ (4)

 ${\varphi _{dev}} = \alpha - \beta$ (5)

 图 1 信号的相位偏差图

3 基于联合MAP的的改进谱减法

3.1 联合MAP估计

 $\left\{ {\hat B,\hat \beta } \right\} = \mathop {{\rm{argmax}}}\limits_{B,\beta } \frac{{p\left( {X\left| {B,\beta } \right.} \right)p\left( {B,\beta } \right)}}{{p\left( X \right)}}$ (6)

 $p\left( {X{\rm{|}}B,\beta } \right) = \frac{1}{{\pi \sigma _d^2}}{\rm{exp}}\left( { - \frac{{{{\left| {X - B{e^{j\beta }}} \right|}^2}}}{{\sigma _d^2}}} \right)$ (7)

 $P\left( B \right) = \frac{{{\mu ^{v + 1}}}}{{{\rm{\Gamma }}\left( {v + 1} \right)}}\frac{{{B^v}}}{{\sigma _s^{v + 1}}}\exp \left( { - \frac{{\mu B}}{{{\sigma _s}}}} \right)$ (8)

 $p\left( \beta \right) = \frac{{{\rm{exp}}\left( {\kappa {\rm{cos}}\left( {\beta - {\beta _\mu }} \right)} \right)}}{{2\pi {I_0}\left( \kappa \right)}}$ (9)

 $\left\{ {{{\hat B}^{MAP}},{{\hat \beta }^{MAP}}} \right\} = \mathop {{\rm{argmax}}}\limits_{B,\beta } {L_1}\left( {B,\beta } \right)$ (10)

 ${L_1}\left( {B,\beta } \right) = {B^v}{\rm{exp}}\left( { - \frac{{{{\left| {X - B{e^{j\beta }}} \right|}^2}}}{{\sigma _d^2}} - \frac{{\mu B}}{{{\sigma _s}}} + \kappa {\rm{cos}}\left( {\beta - {\beta _\mu }} \right)} \right)$ (11)

 ${L_2}\left( {B,\beta } \right) = v{\rm{log}}\left( B \right) - \frac{{{{\left| {X - B{e^{j\beta }}} \right|}^2}}}{{\sigma _d^2}} - \frac{{\mu B}}{{{\sigma _s}}} + \kappa {\rm{cos}}\left( {\beta - {\beta _\mu }} \right)$ (12)

 $\frac{{\partial {L_2}\left( {B,\beta } \right)}}{{\partial \beta }} = \frac{{2AB}}{{\sigma _d^2}}\sin \left( {\alpha - \beta } \right) - \kappa \sin \left( {\beta - {\beta _\mu }} \right) = 0$ (13)

 ${\hat \beta ^{MAP}} = {\rm{g}}\left( B \right) = {\rm{ta}}{{\rm{n}}^{ - 1}}\left( {\frac{{2AB{\rm{sin}}\alpha + \kappa \sigma _d^2{\rm{ sin}}{\beta _\mu }}}{{2AB{\rm{cos}}\alpha + \kappa \sigma _d^2{\rm{ cos}}{\beta _\mu }}}} \right)$ (14)

 $\frac{{\partial {L_2}\left( {B,\beta } \right)}}{{\partial B}} = \frac{v}{B} - \frac{{2B - 2A{\rm{cos}}\left( {\alpha - \beta } \right)}}{{\sigma _d^2}} - \frac{\mu }{{{\sigma _s}}} = 0$ (15)

 ${B^2} - \left( {A{\rm{cos}}\left( {\alpha - \beta } \right) - \frac{{\mu \sigma _d^2}}{{2{\sigma _s}}}} \right)B - \frac{{v\sigma _d^2}}{2} = 0$ (16)

 \begin{aligned}{{\hat B}^{MAP}} = & f\left( \beta \right){\rm{ = }}\left( {A{\rm{cos}}\left( {\alpha - \beta } \right) - \displaystyle\frac{{\mu \sigma _d^2}}{{2{\sigma _s}}}} \right)\\& + \sqrt {{{\left( {A{\rm{cos}}\left( {\alpha - \beta } \right) - \displaystyle\frac{{\mu \sigma _d^2}}{{2{\sigma _s}}}} \right)}^2} + 2v\sigma _d^2} \end{aligned} (17)

m为迭代次数, 则联合MAP估计器中:

 $\left\{ {\begin{array}{*{20}{c}} {{{\hat \beta }_{m + 1}} = f\left( {{{\hat \beta }_m},{{\hat B}_m}} \right)} \\ {{{\hat B}_{m + 1}} = g\left( {{{\hat \beta }_m},{{\hat B}_m}} \right)} \end{array}} \right.$ (18)

 ${E_{m + 1}} = \sum {\left| {{{\hat B}_{m + 1}}{\rm{exp}}\left( {j{{\hat \beta }_{m + 1}}} \right) - {{\hat B}_m}{\rm{exp}}\left( {j{{\hat \beta }_m}} \right)} \right|^2}$ (19)

 图 2 联合MAP交替迭代情况

3.2 基于联合MAP的多频带谱减法

(1)进行预处理并根据傅里叶变换, 转换带噪语音到频域, 计算其功率谱;

(2)划分频带并进行联合MAP估计, 估计各个子频带的相位谱;

(3)计算子频带噪声和带噪语音的功率谱;

(4)按式(4)分别计算各个子频带增强语音的幅度谱;

(5)重建信号并进行反傅里叶变换.

4 实验与分析

 图 3 white

 图 4 pink

 图 5 babble

 图 6 不同背景噪声下各算法的PESQ得分

 图 7 不同背景噪声下各算法的STOI值

5 结论

 [1] 韦高梧, 冯祖勇. 基于去噪技术的DSP语音识别系统设计. 传感器与微系统, 2017, 36(1): 108-111. [2] Hendriks RC, Gerkmann T, Jensen J. DFT-domain based single-microphone noise reduction for speech enhancement: A survey of the state of the art. Synthesis Lectures on Speech and Audio Processing, 2013, 9(1): 1-80. [3] Kleijn WB, Crespo JB, Hendriks RC, et al. Optimizing speech intelligibility in a noisy environment: A unified view. IEEE Signal Processing Magazine, 2015, 32(2): 43-54. DOI:10.1109/MSP.2014.2365594 [4] Loizou PC. Speech enhancement: Theory and practice. Boca Raton, FL, USA: CRC Press, 2013. [5] Upadhyay N, Karmakar A. An improved multi-band spectral subtraction algorithm for enhancing speech in various noise environments. Procedia Engineering, 2013, 64: 312-321. DOI:10.1016/j.proeng.2013.09.103 [6] Wojcicki K, Milacic M, Stark A, et al. Exploiting conjugate symmetry of the short-time Fourier spectrum for speech enhancement. IEEE Signal Processing Letters, 2008, 15: 461-464. DOI:10.1109/LSP.2008.923579 [7] Mowlaee P, Kulmer J. Harmonic phase estimation in single-channel speech enhancement using phase decomposition and SNR information. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015, 23(9): 1521-1532. DOI:10.1109/TASLP.2015.2439038 [8] Kulmer J, Mowlaee P. Phase estimation in single channel speech enhancement using phase decomposition. IEEE Signal Processing Letters, 2015, 22(5): 598-602. DOI:10.1109/LSP.2014.2365040 [9] Mowlaee P, Kulmer J. Phase estimation in single-channel speech enhancement: Limits-potential. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015, 23(8): 1283-1294. DOI:10.1109/TASLP.2015.2430820 [10] Krawczyk-Becker M, Gerkmann T. An evaluation of the perceptual quality of phase-aware single-channel speech enhancement. The Journal of the Acoustical Society of America, 2016, 140(4): EL364-EL369. DOI:10.1121/1.4965288 [11] Krawczyk-Becker M, Gerkmann T. On MMSE-based estimation of amplitude and complex speech spectral coefficients under phase-uncertainty. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, 24(12): 2251-2262. DOI:10.1109/TASLP.2016.2602549 [12] Kulmer J, Mowlaee P. Harmonic phase estimation in single-channel speech enhancement using von mises distribution and prior SNR. Proceedings of 2015 IEEE International Conference on Acoustics, Speech and Signal Processing. Brisbane, QLD, Australia. 2015. 5063–5067. [13] 杜志然, 周萍, 景新幸, 等. 基于谱熵的耳语音增强研究. 传感器与微系统, 2012, 31(6): 69-72. DOI:10.3969/j.issn.1000-9787.2012.06.021 [14] 吴进. 语音信号处理实用教程. 北京: 人民邮电出版社, 2015. 287–298. [15] Mowlaee P, Stahl J, Kulmer J. Iterative joint MAP single-channel speech enhancement given non-uniform phase prior. Speech Communication, 2017, 86: 85-96. DOI:10.1016/j.specom.2016.11.008