Abstract:Aiming at multi-speaker communication scene in real life, an effective solution is designed and proposed based on researches of underdetermined blind speech separation method of target sound source's azimuth information and nonlinear time-frequency masking and BP speaker recognition technology, which can extract any target speaker's speech in any orientation. The solution is generally divided into two stages, one is target speech search and the other is target speech extraction. The search stage uses BP speaker recognition technology. The speech extraction stage uses the method of underdetermined blind speech separation based on sound source azimuth information by an improved potential function clustering and nonlinear time-frequency masking. The results show that the solution is feasible. It can effectively extract the target speaker's speech in any position from the mixed speech stream. The average SNRG is 8.68dB, the similarity coefficient is 85%, the recognition rate is 61%, and the running time is 20.6S.